AlphaZero like implementation for Oware Abapa game (Codingame)

Marchete
11.9K views

Open Source Your Knowledge, Become a Contributor

Technology knowledge has to be shared and made accessible for free. Join the movement.

Create Content

Final Words

I started reading ML information since Agade, PB4 and Fenrir "broke" the leaderboard in Coders Strike Back with Neural Network bots (Feb' 2019). These bot achieved 90%+ winrates vs the best bots to date. I tried reading all the info they gave about ML and NN, but most of the information were very mathematical and complex to me. At this time I did some small parts for a future NN bot (the inference engine), but creating all the pieces and joining them was a complex task.

Then both Robostac, Recurse and Jacek topped most multiplayer leaderboards with their NN bots. It was clear that NN bots was the new GA (Genetic Algorithms).

Some final thoughts:

  • Hard work and perseverance. Some of us aren't "genius" minds, I need a lot of time to implement stuff and make it work. But hard work usually can overcome your limitations. Keep working on the problem, ask for some help and you'll eventually manage to solve a big task.
  • CGZero winrate improves a lot when you optimize the inference CPU time. I passed from 5k sims per turn (sim as number of simulations + NN prediction of that new gamestate) to 10k and the winrate improved noticeably. The more MCTS visits, the better.
  • Try to optimize the performance in NN prediction as much as possible. In my profiling tests the bot was using 70% of the CPU time on NN prediction (Dense layer, in calculate output). So any step you can cache, any mathematical operation you can save in the NN inference part it will improve the overall winrate. I did some improvements in NN_Mokka.
  • I changed the random move selection from https://github.com/marchete/CGZero/blob/master/src/CGZero.cpp#L1501-L1515 to a weighted random selection, based on visits per child, for the first 20 turns. The random selection was reused from: https://github.com/marchete/RN_Explorer/blob/main/src/solorunner/RN_ExploDiv_7.cpp#L1723 As temperature I use 0.74: rndWeight[i]=(i==0?0:rndWeight[i-1])+pow((double)(node+i)->visits,1.35). This should pick better options than a pure random. This is not in the github code.
  • I created a bit bigger network (140+KB in filesize), with 2 hidden layers. I used a float32 to float16 file conversion to reduce the size when sending it to CG. I used the file32to16 and file16to32 functions to change from one format to another. The bot in CG does the unpacking with file16to32 at turn0. It seems that the 32bit to 16bit conversion doesn't degrade the prediction, it seems to have some 1e-4 error in average.
  • One-Hot encodings seems good because you can have a big Dense layer without linearly increase the calculation time (you can just set all to bias, then add the weights that comes from the 'ones'). Thanks Robostac and Jacek for this performance improvement.
  • Robostac also did a policy + value weight concatenation on the bot (training model remains the same), because that reduces the C++ calculation time of the output layers. But that means many changes, first you need to change the SaveModel() in python, so you concatenate(policy_weights,value_weights) and concatenate(policy_bias,value_bias) and in the inference engine it calculates as a whole (so it reduces the operations because 6+1 fits in AVX size of 8) without activation (neither tanh or softmax, it must be done "manually"). After predict() you need to manually extract the value float and do the tanh(value). Then for softmax you need to set -999999.9f (any negative big float, in softmax that's a zero) on the position that was the value, and then do a softmax().
  • At my last test I added the valid moves as one-hot inputs. I don't know if that's redundant (because other inputs implies that), but being just 6 inputs it wasn't too heavy to add it.

On my last tests the training framework learned to play at high level (top 6th) in just 45 minutes of training, starting from scratch. The github code needed about 4hrs to reach the same level of play. Mastering a game in less than 1 hour of training, with a single Core i7 CPU without GPU training it's a great success.

Even with all, this playground briefly explained how an AlphaZero bot could be for Codingame multiplayers. It's a real, working, competent bot, but don't take it as written in stone, take it just as a reference.

Open Source Your Knowledge: become a Contributor and help others learn. Create New Content