Hey everyone! I've been working on a ML project on-and-off for the past few months, where I use ML (neural networks, specifically) to analyze Dota heroes (learn hero embeddings) and predict the result of matches. If you prefer to consume the following content in video form, feel free to check out this Youtube Video:
Dataset
I used OpenDota (
- Train Set: ~15,000 games
- Val Set: ~4,000 games
Model Details
This is the Dota subreddit, not the ML one so I'll keep this relatively brief. For more details, feel free to check out the video linked at the top.
- Model Architecture: Embedding Layer -> Attention Layer -> MLP Output Layers
- Train Task: Given 10 heroes as input, predict:
- Result of game (classification – radiant long win, radiant medium win, radiant fast win, dire long win, dire medium win, dire long win)
- Per Hero:
- Gold/XP per min
- Ratio of team's last hits
- Hero Damage per min
- Ratio of damage that is right click
- Damage Taken per min
- Stun (caused) duration
Model Per-Hero Predictions
Here are the (model predictions / truths) for a test game – TI 2021 Grand Finals Game 1. I only included the radiant hero outputs (Team Spirit) to avoid clutter.
Hero | Last Hit Team Ratio | Hero Damage Per Min | Damage Taken Per Min | Right Click Damage Ratio | Gold Per Min | XP Per Min |
---|---|---|---|---|---|---|
Elder Titan | 0.14 / 0.02 | 590 / 236 | 594 / 132 | 0.34 / 0.30 | 342 / 270 | 460 / 328 |
Naga Siren | 0.28 / 0.49 | 801 / 653 | 808 / 297 | 0.42 / 0.76 | 470 / 228 | 600 / 679 |
Void Spirit | 0.25 / 0.24 | 791 / 625 | 798 / 508 | 0.34 / 0.18 | 470 / 495 | 610 / 667 |
Lion | 0.11 / 0.03 | 594 / 171 | 598 / 204 | 0.30 / 0.45 | 348 / 214 | 472 / 353 |
Tidehunter | 0.22 / 0.22 | 663 / 296 | 668 / 361 | 0.37 / 0.06 | 391 / 449 | 509 / 504 |
- The ordering of the magnitudes of the predictions largely make sense, given the positions heroes play.
- The predictions are often more average, which makes sense given that the result of the game is not conditioned on when making the predictions.
- The damage values seem high overall – a place for improvement.
Model Result Predictions
Here are the model result predictions for Games 3, 4, and 5 of TI Grand Finals in a mixed up order. I'll actually leave the answer out here for now, and if you're interested, you can make a guess in the comments and explain why (so which of A, B, and C corresponds to each game). I'll edit or comment with the answer later.
Radiant Heroes | Dire Heroes | |
---|---|---|
Game 3 | Disruptor, PA, Invoker, Dark Willow, Magnus | Spectre, Tinker, Bloodseeker, Rubick, Undying |
Game 4 | Luna, Kunkka, Magnus, Undying, Bane | Winter Wyvern, Spectre, TA, Lion, Axe |
Game 5 | Winter Wyvern, TB, Ember, Bane, Magnus | Tiny, Kunkka, Lycan, Skywrath, Ench |
Radiant Win <35 min | Radiant Win 35-50 min | Radiant Win >50 min | Dire Win <35 min | Dire Win 35-50 min | Dire Win >50 min | |
---|---|---|---|---|---|---|
Probs A | 0.24 | 0.24 | 0.08 | 0.18 | 0.17 | 0.10 |
Probs B | 0.15 | 0.31 | 0.12 | 0.18 | 0.14 | 0.10 |
Probs C | 0.22 | 0.24 | 0.08 | 0.22 | 0.19 | 0.06 |
- Might be confirmation bias, but there seems to be explainable factors for the matching.
- If you're interested in hearing discussion on this from 2k-7k players, there's a section in the video linked at the top – I guess the answers are also there if you can't wait to find out lol
Learned Embeddings
Here is a 2d visualization of the embeddings the model learned. If you are unfamiliar with embeddings, they are learned representations of the heroes, with the goal that similar heroes would have embeddings that are close to each other. Some things that I immediately noticed:
- Medusa and Spectre on bottom left – quintessential late game carries
- Sven and Kunkka towards top right – melee cores with cleave and burst
- A lot of supports grouped towards bottom and bottom right
If you see something interesting or weird, feel free to leave a comment!
Summary
This project was largely for fun – I definitely think the model was able to learn things about Dota and can spark some interesting discussion, but I wouldn't try to productionize/sell this model at its current state. I do think the potential is there though – with more and richer data, tinkering with model architecture, and more effort spent tuning and evaluating, it would be possible to create a very high quality model.
Source: https://www.reddit.com/r/DotA2/comments/t6n52z/using_machine_learning_to_analyze_dota_heroes_and/