A.I. Plays Fire Emblem Heroes | Monte Carlo Tree Search | UCB1


Hello Everyone, I’m Kuma Have you ever wondered whether it is possible
to solve FEH battles with our Computer? This has been in my mind for a very long time. I have decided to try this by myself, and
I will be showing you how I am doing it in this video. First, how complex is a battle in Fire Emblem
Heroes? There are different kinds of measurements
for game complexity. Let’s start with the state-space complexity. The state-space complexity of a game is the
number of legal game positions reachable from the initial position of the game. It is not easy to know the exact number, but
we can get a rough estimate. Here we have the Infernal Map of Bound Hero
Battle Titania and Mist. Let’s assume we are playing this map with
this fixed team, with fixed skills and seals. There are 27 regular tiles, 10 forest tiles
and 9 flying tiles; and there are 3 cavalry units, 4 infantry units and 3 flying units. So we can get an upper bound of different
possible positions with this formula. And then it is possible for each unit to be
either dead or alive with different numbers in HP. I am going to take a conservative estimation
here, assuming that, on average each unit can be in 3 states, that is dead, full health
or in between. Each unit can also have buff. This can be a buff from skills like ground
orders or Hone, or a mixture of all these. I would go conservative again and assume it
is 2 states for each unit. And the same apply for debuff. Lastly, a unit can be available, or it has
already acted. It is important to know that the enemy AI
in FEH is deterministic. Therefore enemy phase is just a process between
states, but not a different state itself. It means we should only consider player controlled
units this time. So this is a rough estimate of 10 to the power 27 for the state-space complexity. And remember that this number can go even higher if we are talking about maps with reinforcements. Or if the units, skills and seals we used
can change as well. Next we want to know the game tree size. A game tree is a directed graph where the
nodes are states and the edges are actions. The game tree size is the total number of
possible games that can be played. i.e. the number of leaf nodes or terminal
nodes of the game tree. Again, we are going to estimate it, but this
time with some actual statistics. The data here are taken on the Infernal Map of the corresponding hero battles with a fixed F2P team. Data of turn 2 and onwards are taken by using
moves from a known solution on the turns before. The numbers in white are the valid moves for
each turn in the corresponding maps. And the numbers below them are the number of
valid moves which all the player units survive after the enemy phase. After removing the top 3 and bottom 3, we
take an average for the valid moves and the survival moves. Using this, we can get an estimate of the
game tree size for different numbers of turns. We have to multiply it by 24 if we allow swap
spaces. So for maps which usually end on turn 4, the
game tree size is about 10 to the power 20. And it grows exponentially with respect to
the number of turns. For each extra turn, the game tree size would
be increased by a factor of over 10,000. Also, just like the state space complexity,
this would go up if we consider different team combinations and seals. This means that even though it may not be
as complex as games like chess or go, it is still impossible to do a full search
on the game tree or the state-space for FEH Battles Before I can get to work on the actual AI,
I have to recreate the whole battle simulation, such that it is accessible for our AI to be
made. To do that we need to know every single detail
of the enemy AI. That include lots of moving parts normal players
do not know and do not need to know. For example Will an enemy unit use its assists instead
of attack? Why does an enemy unit use reposition instead
of moving up? Or Why does an enemy unit move in one direction
but not the other direction? This is also one of the most popular comments
I have received for my F2P solutions. Luckily someone has studied that before. Let me give a special thanks to “Mia’s AI
Manipulation Instructional Academy” here, which has recorded a lot of minor details
of the Enemy AI in FEH. However the logic written there is only 80%
correct. Still, that gives me a very good start for
my battle simulator. Before we move on to the details. I think it is time to show you a quick demo
on the current version of the A.I. BTW, don’t forget to subscribe to the channel
and smash the bell icon to turn on notifications. Your support is very important to me. We are going to play on the Infernal Map of
Mythic Hero Battle: Sothis using the team on the left. I am using a generic indoor map to visualize
the battles. The look may be different but the terrains
are exactly the same. The AI is actually simulating much more battles
in the background. While it is showing the battle with the best
result in the recent simulations. You can find the statistics of the A.I. on
the upper part of the screen. They are updated together with the battle
on display once the previous one ends. Attempts is the number battles simulated. Nodes is the numbers of nodes in our tree,
and that is related to memory usage. We will talk about this when we go through
the details. Elapsed is the time used. Kills, Death, and score are about the current
battle on display. Best is the best score we have ever had for
the whole simulation. On the right side, it shows the steps of the
current battle for both player controlled units and enemy units. Actually it took me more time in writing this
battle simulator than the A.I. itself. I am not going into the details of that in
this video. If you are interested, comment below to let
me know. Maybe I can make a separate video if lots
of you are interested. In order to explore the game tree to find
the solution without a full search, I am using my own variation of MCTS. MCTS stands for Monte Carlo Tree Search. It is a game tree search algorithm. The famous AlphaGo and AlphaGo Zero use their
own variation of MCTS combining with Machine Learning in their A.I. There are 4 steps in MCTS. Select, Expand, Playout and Backpropagation. On select, starting from the root node, we
select a child until we reach a leaf node. If it is the first time we visit this leaf
node, perform the playout for it, otherwise, we expand it. For each of the valid moves from the current
node, create a corresponding child node, and select one of the child nodes randomly to
playout. For the playout, starting from the state of
the chosen node, perform random moves until the game ends. We get a score of 1 for a win and 0 if we
lose. We backpropagate the result to the parents
until we reach the root node, such that each node can update its own record
on the total numbers of wins and tries for itself and its child nodes. So, how exactly do we select a child? MCTS is usually used together with UCB1. It converts the average score into decision
values. If at least one of the child nodes is never
visited, we select one of these nodes randomly. Otherwise we apply the following formula for
each child and select the one with highest value. Small n is the number or tries of the child
node. Capital N is the number of tries for the current
node. While x is the number of wins of the child
node. And c is the exploration constant, which is
root 2 by default. The first term of the formula is the average
number of wins of the child node. It corresponds to exploitation of deep variants
after moves with high average win rate. And the second term corresponds to the exploration
of moves with fewer simulations. The formula tries to balance the exploitation
and exploration of the game tree in order to gain a good result without running
a full search. Because we are solving a puzzle instead of
playing a 1 vs 1 game. If we keep using the score of 1 for success
and 0 for failure, then obviously it would be totally pointless. It would be all 0 until we find a solution,
and before that UCB1 cannot help us with exploitation. We have to set our own score, here is the
formula for the version 1 of our A.I. We have 3 criteria for scoring. Enemies Defeated, this is very straightforward,
I believe no explanation is needed. And then we have Player Units Alive. I have implemented the simulation a little
bit different from the actual game. If it is in the enemy phase, the battle would
not end immediately when a player unit is defeated. Instead it ends after that whole enemy phase. I believe this can help to identify bad moves that may lead to multiple deaths in enemy phase. The last one is about phase remaining. I have set a phase limit, the default is 20. This is used to suggest the AI to choose a
solution with fewer moves. I knew that this may cause a problem making
the AI fail in finding any solution. But for our first version, let’s just try
and see. It didn’t take too much time for me to implement
the first version of our A.I. I used the Infernal Map of Bound Hero Battle
Titania and Mist for testing, because it was available at that time. The score on display is multiplied by 1000. You can see that we are able to simulate around
300 battles in 1 sec, this is not too bad. But I know we can do better. Also, when the program is run for a long time,
the memory goes up very quickly. This is expected because we have to store
the tree of MCTS, but we can optimize this a little bit. So let’s head back and optimize it first. For memory, we can remove nodes which are
no longer in use. These are nodes where the game ends,
and it cascades to the parent nodes when all the child nodes are removed. Also whenever we are getting over 600,000
nodes for our tree, we are going to confirm one move and discard
all the sibling branches. For performance. First, we have to make use of the multi-core
cpu by using multi-threading. There are multiple ways of doing it. The easiest way is called Root Parallelism. Creating a separate root for each thread,
and run MCTS from there, so each thread has its own tree. This is easiest because no memory is shared
between threads, so only minimal changes are needed on the code. However, for the same reason, efforts could
be wasted for testing the same nodes on different threads. So instead, I went for the next method – Tree
Parallelism, where a single tree is being used for all threads. For this method, some kind of synchronization
is required between these threads, and a little bit more coding is needed. Another performance upgrade is by caching. The enemy phase is taking the most time in
the simulation, and one of the heaviest operations is various
pathfindings, which could depend on movement type, terrains,
obstacles etc. By caching these pathfinding results,
we are able to speed things up a lot. With all these optimizations, let’s try
the map again. Now, we are able to reach about 4000 simulations
per second. I am using a 6-core cpu with 4 cores assigned
for the A.I. And we are showing Node count as well,
it is the number of nodes we have in the MCTS. For performance reasons, each node stores
its own game board, and that is the largest portion of memory
used by the A.I. I am going to speed up the playback from here. It took 4 mins for the AI to figure out this
solution. You may have noticed that the AI is not very
smart. There are several moments it has almost made
it, but seems it didn’t learn from that experience. Let’s be clear that
this version is different from the one in the demo earlier. Now. Let’s give it another test with the Infernal
Map of Mythic Hero Battle: Sothis. The AI is still repeating the same mistakes. I call them the same mistakes,
but it is important to note that they are actually different paths. The AI is programmed to not repeat the same
moves. It may look similar to our human eyes,
but there are some minor differences. For example, in one path Ike moves first before
Eir, and in another path it is the reverse. And if you remember, this is about the state-space
we mentioned. In this AI, we do not take that into account. This could be part of the problem where the
AI is not performing well, but there is much more than that. After spending more time to think, as well
as reading more papers. I have a better understanding of MCTS and
UCB1. UCB1 was designed to solve the problem of
multi-armed bandit, which assumes each arm returns a reward following
a probability distribution. Looking at the UCB1 function, we can see that
only the average score and the number of tries are taken into account. Variance or standard deviation is not considered. So for our case, as I am using a reward of
real numbers between 0 to 1. This could create a problem if the actual
reward has a much smaller variance than expected by UCB1. For example, if the actual score is always
between 0.1 to 0.2 The difference of the first term would be
so much smaller, making the system to favour exploration over exploitation. Also, MCTS does not work very well for extreme
situations. This is because MCTS maximizes expected win
probability, but not the score margin. The Computer Go community uses a method of
dynamic Komi to work around this. Komi is Go term for a point penalty imposed
on one of the players. This method tries to make the game more even,
by using an internal Komi determined based on the current board situation. This does not apply to our case directly,
because we are already using a score instead of the win count. However, the dynamic nature could be something
we can use. Some papers also mentioned guided playout. Instead of playout using completely random
moves, we choose moves which are known to perform
better with our domain knowledge. These papers suggest that this would make
the AI smarter. So, with all these knowledge, we are going
to upgrade our AI. First, we are implementing the guided playout
on this new version, prioritizing Dance and Attacks over other moves. Also we are modifying the UCB1 selection like
this. Instead of forcing our score to fall between
0 to 1, we use unbounded score. Before putting our score into the UCB1 function,
we normalize our score. Dividing it by the Best Score minus the Average
Score of the parent node. And the new score contains 5 components now. Enemies Defeated, Enemies’ HP missing,
Player Units Alive, Player Unit HP remaining and Phase Left. This video is already 20 minutes long. I am going to finish the video showing our
A.I. version 2 playing on Sothis Infernal map. However, this is not the end of our A.I. journey. There are more upgrades that have been done
or to be done. I have also skipped several details to keep
the video from being too long. If you want to know all these, remember to
like the video and share it to your friends. Then maybe I can create part two of this. Without further ado, let’s get into the battle This time we allow swap spaces. If you have watched my latest F2P solution
on the Infernal map of Mythic Hero Battle: Sothis, you may realize that it looks familiar. And yes, it is my first A.I. assisted solution
and also the only one I have uploaded at the moment of making this video. Thanks for watching. If you enjoy the video, don’t forget to
give it a thumbs up and subscribe to the channel. Also, remember to click on the bell icon to
receive notifications on my new videos. You can also find the links for my twitter
and discord down below. Hope to see you guys next time.

, , , , , , , , , ,

Post navigation

17 thoughts on “A.I. Plays Fire Emblem Heroes | Monte Carlo Tree Search | UCB1

  1. Nice video kuma, and yes! do another video with more details. MCTS was so hard to understand in the college, but you let the things more easy xD Thanks bro.

  2. This is really cool to watch. I have to assume your a grad student somewhere. If I may ask, where do you go to school and what program? If not, I have to say you are very talented and should consider higher education.

  3. This is great stuff. I've been thinking about applying AI to this game as well. Good job, and I'm definitely interested in more content like this!

  4. Thanks for this, KumaTheta. Would it be possible for you to do the Sharena Lunatic quests? Specially Book IV Chapter 3 Part 5 and Chapter 4 Part 5. The dancers are difficult for me.

Leave a Reply

Your email address will not be published. Required fields are marked *