Subreddit Data Analysis
Summary of Project
Captured all ~169k Posts and 2.7 million comments from the /r/worldoftanks subreddit from April 2011 through April 2020.
The subreddit data analysis will be broken into three separate postings which will cover:
This particular post will cover the Posts submitted to the /r/worldoftanks subreddit.
However, before I get into those details, I will first share how the data was obtained and processed.
I owe a huge debt of thanks to Jason Baumgartner (/u/Stuck_In_the_Matrix), who runs the
Essentially, he makes a copy of all Reddit posts and comments by querying Reddit's API, and makes that data available for download. He is the proverbial genius whose shoulders I have stood upon, and his colossal efforts make my modest data project possible. As a result, I support him on
I understand that many Redditors have privacy concerns about their data being stored/viewable on the internet, and some have taken active steps to remove their data (comments/posts) after a period of time. The /r/pushshift subreddit does have a
Note: Even if you delete your comments and posts periodically here on Reddit via a script/bot, your data will still be captured by pushshift because it captures the data almost as soon as it is created.
You may also DM me, and I'll delete your data from my data set too.
The Reddit data is stored here:
New months are added on no particular schedule.
The files are compressed in the bz2, xz, or zst format, and I used
Once decompressed, I needed to separate out the /r/worldoftanks data from the non-worldoftanks data. This is no small task when an 8.9 GB compressed file becomes a 97.6 GB uncompressed text file. Fortunately, I found a pretty amazing text editor named
Notes: You'll need Excel 2007 or greater to get past the 65,536 row limit that exists in previous versions.
Results – Posts
Distribution of Posts
In terms of frequency, here are the number of posts per year in the /r/worldoftanks subreddit:
*Note: The 2020 number only contains records through April 2020.
Number of Posts Over Time
Here is a graph showing the number of /r/worldoftanks posts over time:
Number of Posts Over Time – Analysis
/r/worldoftanks posting activity seems to be correlated with the academic schedules followed in the US and Europe.
For example, in 2013, 2014, 2015, and 2018, you can see a general increase in the late-spring/summer months (when US students are on summer break), and then a decrease starting in September thru November (corresponding to the US Fall semester which runs mid-August through December), and then a peak in December when US students are on Christmas Break. 2016 and 2017 don't show this trend. 2019 has a spike in May and June, but the general trend is an increase from April to August.
Interestingly, during the US Spring semester (which runs mid-January through the end of April), posting activity increases from Jan through April in most years.
EU-country's academic schedules are much more varied than the US, but my research indicates that many EU Fall school breaks run from mid-Dec – early-January, and EU Spring breaks runs from late March to mid-April.
The observed data supports an EU fall break in December, and the EU Spring break may help to explain the increases frequently seen in March, and the frequent peaks seen in the month of April.
Types of Posts
There are two types of posts: text posts, where you write the content in a text box, and link posts, where you provide a link to another website, image, video, etc.
Of the 169,737 posts in the /r/worldoftanks subreddit, 92,803 were self posts (54.7%), and 76,934 were link posts (45.3%). This designation is maintained even for deleted posts.
Reddit records the domain for each linked post. Of the 76,934 linked posts, the post count of the 25 most-linked-to domains totaled to 69,983 records, or 91.0% of all linked posts.
|Ranking||Domain||Posts||Cumulative # of Posts||% of Total Linked Posts||Cumulative %|
|5||v.redd.it (video saved to reddit)||2269||60,631||2.9%||78.8%|
|12||ritastatusreport.live (and earlier versions)||424||67,216||0.6%||87.4%|
|Total / % of Total||76,934||100.0%|
Average Post Karma by Domain
On a domain basis, I calculated the sum of the karma for all posts for that domain, then divided by the number of posts for that domain, in order to calculate the average karma for posts from that domain.
|Ranking||Domain||# of Posts by Domain||Total Karma||Average Karma|
|2||v.redd.it (video saved to reddit)||2269||251,118||110.7|
|9||ritastatusreport.live (and earlier versions)||424||8,855||20.9|
Of the 169,737 posts in the subreddit, 12,620 were edited after being created (7.4%). The remaining 157,117 (92.6%) were not edited after creation.
It appears that the ability to award Gold to posts was added in September 2012. From September 2012 through April 2020, 112 different posts received 124 Gold awards in the /r/worldoftanks subreddit. Three gilded posts were from deleted users.
Users who have had more than one gold post
Of those 124 gilded posts, seven users had more than one gilded post:
|Author||Number of Gilded Posts|
Posts with more than one Gold
A very small number of /r/worldoftanks Reddit posts have received more than one gold award:
Posts with the Most Comments
A good post elicits a strong response in the form of comments. Not surprisingly, most of the posts with a high number of comments are giveaways:
Note: The current number of comments shown will not match these counts due to comment deletions that occurred after the post data was captured.
Posts with Highest Karma Scores
The karma numbers displayed on a post are not "real" numbers, they have been "fuzzed" to prevent spam bots from figuring out exactly how Reddit's karma system works. So, if you click on a link below, the karma score may not match (because it was a fuzzed number when captured, and it could have been upvoted/downvoted after it was captured).
Redditors with Highest Average Karma Scores
These are the Redditors who have gotten the most Karma on a per post basis. They are the most efficient post karma generators–not necessarily the most prolific in their postings.
|Ranking||Author||Average Karma||Post Count of author||Post Karma||Link||Domain|
Karma by Posters
If we take all of the posts of a user, then add up the posting karma on each of their posts, I can come up with a total posting karma number that can be used to rank the users to determine who has generated the largest amount of posting karma in the subreddit.
The total amount of Karma earned by all non-deleted users is 4,355,083, while deleted users generated 93,917 karma, for a grand Karma total of 4,449,000 post Karma.
Here are the top ten Karma-earning Posters in /r/worldoftanks:
|Rank||Redditor||Total Post Karma|
Average Post Karma vs Average # of Post Comments
I calculated the average post karma and the average # of comments, over time:
|Year||Average Karma||Avg # Comments|
The total correlation between the two data sets was a low negative correlation of -0.342489168.
However, when I graphed the data, something interesting was revealed.
The data had a low positive correlation between 2011 and the start of 2017 (0.69), but then becomes negatively correlated (-0.99) as the Average post Karma increases substantially starting in 2017 and into April 2020.
I considered the fact that it may be related to the subscriber count (more eyes = more upvotes), but the subscriber count increase seemed fairly linear (unlike the average post Karma value):
Note: This subscriber data was obtained from [subredditstats](
Statistics was a LONG time ago, so I'd welcome thoughts on what may be causing a higher average post karma.
It appears that Reddit removed Downvotes and Upvotes from the API in December 2016 (and thereafter just provided the combined Score value). However, here are the 10 most downvoted posts prior to that date:
To date, there have been 29,270 flaired posts, and they have been distributed as follows:
|Flair||Flair Count||% of all Flairs||Cumulative %|
|Post Battle Result||1,476||5.0%||81.7%|
|(Fat-Fingered by Mods)||1,336||5.0%||86.3%|
Post Flair Karma
Did you want to farm Karma here on the /r/worldoftanks subreddit? Here is how the different flaired post types have faired on an average karma basis:
|Flair||Post Count with that Flair||Average Karma|
|Post Battle Result||1476||32.1|
Apparently, you shouldn't waste your time making a helpful guide. Make a
Most active posters
In terms of the most active posters, as defined by the total number of posts made:
|Ranking||Author||# of Posts in /r/worldoftanks|
Count of Users with the Same Number of Posts
From April 2011 through April 2020, 169,737 posts were made in the /r/worldoftanks subreddit. 32,794 unique users submitted 146,643 of those posts. The balance–23,094–were deleted posts whose authors can't be determined (that is, 13.6% or ~1/7 posts have been deleted). Those deleted posts may have been made by still-existing users, or the posts may have been made by now-deleted users.
Note: 32,794 is NOT the total number of users in the sub. See the end of this post for the true number (which includes users that have commented).
Here is the breakdown by # of users and their # of posts:
|# of Posts||# of Users with that # of posts||% of users||Cumulative %|
While Reddit does display the number of Redditors that have subscribed to the sub, we cannot tell how many of those subscribers have ever made a post (or comment). Thus, we can't determine an 'active' vs 'lurker' percentage.
Either an analysis of the 2,745,764 comments, or the 78,764 unique Reddit users that have posted or commented in the /r/worldoftanks subreddit.
I would also welcome requests and suggestions on different ways to analyze the data.
Edit: Header formats