Using Machine Learning to Analyze Dota Heroes and Predict Matches

DOTA 2 Guides

Hey everyone! I've been working on a ML project on-and-off for the past few months, where I use ML (neural networks, specifically) to analyze Dota heroes (learn hero embeddings) and predict the result of matches. If you prefer to consume the following content in video form, feel free to check out this Youtube Video: https://www.youtube.com/watch?v=OI1rYJPQ_-U

Dataset

I used OpenDota (https://www.opendota.com/) to collect data. I collected data on ranked games, with average mmr > 4k, on patch 7.30.

  • Train Set: ~15,000 games
  • Val Set: ~4,000 games

Model Details

This is the Dota subreddit, not the ML one so I'll keep this relatively brief. For more details, feel free to check out the video linked at the top.

  • Model Architecture: Embedding Layer -> Attention Layer -> MLP Output Layers
  • Train Task: Given 10 heroes as input, predict:
    • Result of game (classification – radiant long win, radiant medium win, radiant fast win, dire long win, dire medium win, dire long win)
    • Per Hero:
      • Gold/XP per min
      • Ratio of team's last hits
      • Hero Damage per min
      • Ratio of damage that is right click
      • Damage Taken per min
      • Stun (caused) duration

Model Per-Hero Predictions

Here are the (model predictions / truths) for a test game – TI 2021 Grand Finals Game 1. I only included the radiant hero outputs (Team Spirit) to avoid clutter.

HeroLast Hit Team RatioHero Damage Per MinDamage Taken Per MinRight Click Damage RatioGold Per MinXP Per Min
Elder Titan0.14 / 0.02590 / 236594 / 1320.34 / 0.30342 / 270460 / 328
Naga Siren0.28 / 0.49801 / 653808 / 2970.42 / 0.76470 / 228600 / 679
Void Spirit0.25 / 0.24791 / 625798 / 5080.34 / 0.18470 / 495610 / 667
Lion0.11 / 0.03594 / 171598 / 2040.30 / 0.45348 / 214472 / 353
Tidehunter0.22 / 0.22663 / 296668 / 3610.37 / 0.06391 / 449509 / 504
  • The ordering of the magnitudes of the predictions largely make sense, given the positions heroes play.
  • The predictions are often more average, which makes sense given that the result of the game is not conditioned on when making the predictions.
  • The damage values seem high overall – a place for improvement.

Model Result Predictions

Here are the model result predictions for Games 3, 4, and 5 of TI Grand Finals in a mixed up order. I'll actually leave the answer out here for now, and if you're interested, you can make a guess in the comments and explain why (so which of A, B, and C corresponds to each game). I'll edit or comment with the answer later.

Radiant HeroesDire Heroes
Game 3Disruptor, PA, Invoker, Dark Willow, MagnusSpectre, Tinker, Bloodseeker, Rubick, Undying
Game 4Luna, Kunkka, Magnus, Undying, BaneWinter Wyvern, Spectre, TA, Lion, Axe
Game 5Winter Wyvern, TB, Ember, Bane, MagnusTiny, Kunkka, Lycan, Skywrath, Ench

Radiant Win <35 minRadiant Win 35-50 minRadiant Win >50 minDire Win <35 minDire Win 35-50 minDire Win >50 min
Probs A0.240.240.080.180.170.10
Probs B0.150.310.120.180.140.10
Probs C0.220.240.080.220.190.06
  • Might be confirmation bias, but there seems to be explainable factors for the matching.
  • If you're interested in hearing discussion on this from 2k-7k players, there's a section in the video linked at the top – I guess the answers are also there if you can't wait to find out lol

Learned Embeddings

Hero Value Embeddings

Here is a 2d visualization of the embeddings the model learned. If you are unfamiliar with embeddings, they are learned representations of the heroes, with the goal that similar heroes would have embeddings that are close to each other. Some things that I immediately noticed:

  • Medusa and Spectre on bottom left – quintessential late game carries
  • Sven and Kunkka towards top right – melee cores with cleave and burst
  • A lot of supports grouped towards bottom and bottom right

If you see something interesting or weird, feel free to leave a comment!

Summary

This project was largely for fun – I definitely think the model was able to learn things about Dota and can spark some interesting discussion, but I wouldn't try to productionize/sell this model at its current state. I do think the potential is there though – with more and richer data, tinkering with model architecture, and more effort spent tuning and evaluating, it would be possible to create a very high quality model.

Source: https://www.reddit.com/r/DotA2/comments/t6n52z/using_machine_learning_to_analyze_dota_heroes_and/

leave a comment

Your email address will not be published. Required fields are marked *