Using Machine Learning to Analyze Dota Heroes and Predict Matches

Hey everyone! I've been working on a ML project on-and-off for the past few months, where I use ML (neural networks, specifically) to analyze Dota heroes (learn hero embeddings) and predict the result of matches. If you prefer to consume the following content in video form, feel free to check out this Youtube Video: https://www.youtube.com/watch?v=OI1rYJPQ_-U

Dataset

I used OpenDota (https://www.opendota.com/) to collect data. I collected data on ranked games, with average mmr > 4k, on patch 7.30.

Train Set: ~15,000 games
Val Set: ~4,000 games

Model Details

This is the Dota subreddit, not the ML one so I'll keep this relatively brief. For more details, feel free to check out the video linked at the top.

Model Architecture: Embedding Layer -> Attention Layer -> MLP Output Layers
Train Task: Given 10 heroes as input, predict:
- Result of game (classification – radiant long win, radiant medium win, radiant fast win, dire long win, dire medium win, dire long win)
- Per Hero:
  - Gold/XP per min
  - Ratio of team's last hits
  - Hero Damage per min
  - Ratio of damage that is right click
  - Damage Taken per min
  - Stun (caused) duration

Model Per-Hero Predictions

Here are the (model predictions / truths) for a test game – TI 2021 Grand Finals Game 1. I only included the radiant hero outputs (Team Spirit) to avoid clutter.

Hero	Last Hit Team Ratio	Hero Damage Per Min	Damage Taken Per Min	Right Click Damage Ratio	Gold Per Min	XP Per Min
Elder Titan	0.14 / 0.02	590 / 236	594 / 132	0.34 / 0.30	342 / 270	460 / 328
Naga Siren	0.28 / 0.49	801 / 653	808 / 297	0.42 / 0.76	470 / 228	600 / 679
Void Spirit	0.25 / 0.24	791 / 625	798 / 508	0.34 / 0.18	470 / 495	610 / 667
Lion	0.11 / 0.03	594 / 171	598 / 204	0.30 / 0.45	348 / 214	472 / 353
Tidehunter	0.22 / 0.22	663 / 296	668 / 361	0.37 / 0.06	391 / 449	509 / 504

The ordering of the magnitudes of the predictions largely make sense, given the positions heroes play.
The predictions are often more average, which makes sense given that the result of the game is not conditioned on when making the predictions.
The damage values seem high overall – a place for improvement.

Model Result Predictions

Here are the model result predictions for Games 3, 4, and 5 of TI Grand Finals in a mixed up order. I'll actually leave the answer out here for now, and if you're interested, you can make a guess in the comments and explain why (so which of A, B, and C corresponds to each game). I'll edit or comment with the answer later.

	Radiant Heroes	Dire Heroes
Game 3	Disruptor, PA, Invoker, Dark Willow, Magnus	Spectre, Tinker, Bloodseeker, Rubick, Undying
Game 4	Luna, Kunkka, Magnus, Undying, Bane	Winter Wyvern, Spectre, TA, Lion, Axe
Game 5	Winter Wyvern, TB, Ember, Bane, Magnus	Tiny, Kunkka, Lycan, Skywrath, Ench

	Radiant Win <35 min	Radiant Win 35-50 min	Radiant Win >50 min	Dire Win <35 min	Dire Win 35-50 min	Dire Win >50 min
Probs A	0.24	0.24	0.08	0.18	0.17	0.10
Probs B	0.15	0.31	0.12	0.18	0.14	0.10
Probs C	0.22	0.24	0.08	0.22	0.19	0.06

Might be confirmation bias, but there seems to be explainable factors for the matching.
If you're interested in hearing discussion on this from 2k-7k players, there's a section in the video linked at the top – I guess the answers are also there if you can't wait to find out lol

Learned Embeddings

Hero Value Embeddings

Here is a 2d visualization of the embeddings the model learned. If you are unfamiliar with embeddings, they are learned representations of the heroes, with the goal that similar heroes would have embeddings that are close to each other. Some things that I immediately noticed:

Medusa and Spectre on bottom left – quintessential late game carries
Sven and Kunkka towards top right – melee cores with cleave and burst
A lot of supports grouped towards bottom and bottom right

If you see something interesting or weird, feel free to leave a comment!

Summary

This project was largely for fun – I definitely think the model was able to learn things about Dota and can spark some interesting discussion, but I wouldn't try to productionize/sell this model at its current state. I do think the potential is there though – with more and richer data, tinkering with model architecture, and more effort spent tuning and evaluating, it would be possible to create a very high quality model.

Source: https://www.reddit.com/r/DotA2/comments/t6n52z/using_machine_learning_to_analyze_dota_heroes_and/