Captured all ~169k Posts and 2.7 million comments from the /r/worldoftanks subreddit from April 2011 through April 2020.
The subreddit data analysis will be broken into three separate postings which will cover:
Posts
Comments
Users
This particular post will cover the Posts submitted to the /r/worldoftanks subreddit.
However, before I get into those details, I will first share how the data was obtained and processed.
I owe a huge debt of thanks to Jason Baumgartner (/u/Stuck_In_the_Matrix), who runs the pushshift.io website, which is big-data storage and analytics project, and who moderates the /r/pushshift subreddit.
Essentially, he makes a copy of all Reddit posts and comments by querying Reddit's API, and makes that data available for download. He is the proverbial genius whose shoulders I have stood upon, and his colossal efforts make my modest data project possible. As a result, I support him on Patreon.
I understand that many Redditors have privacy concerns about their data being stored/viewable on the internet, and some have taken active steps to remove their data (comments/posts) after a period of time. The /r/pushshift subreddit does have thread where you can request that your data is deleted/no longer saved in the future.
Note: Even if you delete your comments and posts periodically here on Reddit via a script/bot, your data will still be captured by pushshift because it captures the data almost as soon as it is created.
You may also DM me, and I'll delete your data from my data set too.
The Reddit data is stored here: http://files.pushshift.io/reddit/
The monthly submission files contain all posts made in ALL the 2.2 million reddit sub-reddits since Reddit's inception in January 2011, and through April 2020. The monthly subreddit files ranged in size from 0.15 MB to 9.1 GB.
New months are added on no particular schedule.
The files are compressed in the bz2, xz, or zst format, and I used Peazip (for zst) or Winzip (for bz2 and xz) to unzip the JSON data (and yes, I actually bought Winzip).
Once decompressed, I needed to separate out the /r/worldoftanks data from the non-worldoftanks data. This is no small task when an 8.9 GB compressed file becomes 97.6 GB uncompressed text file. Fortunately, I found a pretty amazing text editor named EmEditor, which can open very large text files. It was also able to split the larger files into more 'manageable' 10 GB chunks that my laptop could process without choking (Note: Once I got my new PC with 32 GB of RAM, I no longer had to split the files). I was able to use EmEditor's Find capability to search the files and Extract each line in the file that contained the /r/worldoftanks subreddit identifier (t5_2s113). For example, the March 2018 Reddit Submissions file contained 96,490,262 records, and only 1,856 (0.0019%) were from the /r/worldoftanks subreddit.
Once extracted, I had to convert the JSON-formatted records into Excel. To do this, I used www.json-csv.com, which charges $10 per month if you want to convert over 1 MB/day.
Notes: You'll need Excel 2007 or greater to get past the 65,536 row limit that exists in previous versions.
In terms of frequency, here are the number of posts per year in the /r/worldoftanks subreddit:
Year | /r/worldoftanks posts |
---|---|
2011 | 579 |
2012 | 7,378 |
2013 | 19,875 |
2014 | 17,904 |
2015 | 17,536 |
2016 | 19,803 |
2017 | 23,854 |
2018 | 21,044 |
2019 | 26,525 |
2020 | 15,239* |
Total | 169,737 |
*Note: The 2020 number only contains records through April 2020.
Here is a graph showing the number of /r/worldoftanks posts over time:
/r/worldoftanks posting activity seems to be correlated with the academic schedules followed in the US and Europe.
For example, in 2013, 2014, 2015, and 2018, you can see a general increase in the late-spring/summer months (when US students are on summer break), and then a decrease starting in September thru November (corresponding to the US Fall semester which runs mid-August through December), and then a peak in December when US students are on Christmas Break. 2016 and 2017 don't show this trend. 2019 has a spike in May and June, but the general trend is an increase from April to August.
Interestingly, during the US Spring semester (which runs mid-January through the end of April), posting activity increases from Jan through April in most years.
EU-country's academic schedules are much more varied than the US, but my research indicates that many EU Fall school breaks run from mid-Dec – early-January, and EU Spring breaks runs from late March to mid-April.
The observed data supports an EU fall break in December, and the EU Spring break may help to explain the increases frequently seen in March, and the frequent peaks seen in the month of April.
There are two types of posts: text posts, where you write the content in a text box, and link posts, where you provide a link to another website, image, video, etc.
Of the 169,737 posts in the /r/worldoftanks subreddit, 92,803 were self posts (54.7%), and 76,934 were link posts (45.3%). This designation is maintained even for deleted posts.
Reddit records the domain for each linked post. Of the 76,934 linked posts, the post count of the 25 most-linked-to domains totaled to 69,983 records, or 91.0% of all linked posts.
Ranking | Domain | Posts | Cumulative # of Posts | % of Total Linked Posts | Cumulative % |
---|---|---|---|---|---|
1 | imgur.com | 21,494 | 21,494 | 27.9% | 27.9% |
2 | i.redd.it/i.reddituploads.com (images) | 18,719 | 40,213 | 24.3% | 52.3% |
3 | youtube.com | 15,146 | 55,359 | 19.7% | 72.0% |
4 | gfycat.com | 3003 | 58,362 | 3.9% | 75.9% |
5 | v.redd.it (video saved to reddit) | 2269 | 60,631 | 2.9% | 78.8% |
6 | worldoftanks.com | 1740 | 62,371 | 2.3% | 81.1% |
7 | twitch.tv | 1508 | 63,879 | 2.0% | 83.0% |
8 | worldoftanks.eu | 1318 | 65,197 | 1.7% | 84.7% |
9 | ftr.wot-news.com | 646 | 65,843 | 0.8% | 85.6% |
10 | wotreplays.eu | 495 | 66,338 | 0.6% | 86.2% |
11 | forum.worldoftanks.com | 454 | 66,792 | 0.6% | 86.8% |
12 | ritastatusreport.live (and earlier versions) | 424 | 67,216 | 0.6% | 87.4% |
13 | reddit.com | 325 | 67,541 | 0.4% | 87.8% |
14 | eu.wargaming.net | 303 | 67,844 | 0.4% | 88.2% |
15 | thearmoredpatrol.com | 272 | 68,116 | 0.4% | 88.5% |
16 | streamable.com | 258 | 68,374 | 0.3% | 88.9% |
17 | thedailybounce.net | 236 | 68,610 | 0.3% | 89.2% |
18 | forum.worldoftanks.eu | 222 | 68,832 | 0.3% | 89.5% |
19 | worldoftanks.asia | 200 | 69,032 | 0.3% | 89.7% |
20 | na.wargaming.net | 192 | 69,224 | 0.2% | 90.0% |
21 | gyazo.com | 188 | 69,412 | 0.2% | 90.2% |
22 | puu.sh | 180 | 69,592 | 0.2% | 90.5% |
23 | strawpoll.me | 143 | 69,735 | 0.2% | 90.6% |
24 | facebook.com | 127 | 69,862 | 0.2% | 90.8% |
25 | twitter.com | 121 | 69,983 | 0.2% | 91.0% |
>25 | (other domains) | 6,951 | 76,934 | 9.0% | 100.0% |
Total / % of Total | 76,934 | 100.0% |
On a domain basis, I calculated the sum of the karma for all posts for that domain, then divided by the number of posts for that domain, in order to calculate the average karma for posts from that domain.
Ranking | Domain | # of Posts by Domain | Total Karma | Average Karma |
---|---|---|---|---|
1 | i.redd.it/i.reddituploads.com (images) | 18,719 | 2,136,606 | 114.1 |
2 | v.redd.it (video saved to reddit) | 2269 | 251,118 | 110.7 |
3 | gfycat.com | 3,003 | 218,116 | 72.6 |
4 | imgur.com | 21,494 | 747,048 | 34.8 |
5 | streamable.com | 258 | 8,710 | 33.8 |
6 | puu.sh | 180 | 5,984 | 33.2 |
7 | thedailybounce.net | 236 | 6,326 | 26.8 |
8 | ftr.wot-news.com | 646 | 15,168 | 23.5 |
9 | ritastatusreport.live (and earlier versions) | 424 | 8,855 | 20.9 |
10 | twitch.tv | 1508 | 30,705 | 20.4 |
11 | thearmoredpatrol.com | 272 | 5,004 | 18.4 |
12 | forum.worldoftanks.eu | 222 | 3,866 | 17.4 |
13 | worldoftanks.com | 1,740 | 26,670 | 15.3 |
14 | worldoftanks.eu | 1,318 | 18,897 | 14.3 |
15 | twitter.com | 121 | 1,651 | 13.6 |
16 | facebook.com | 127 | 1,714 | 13.5 |
17 | worldoftanks.asia | 200 | 2,547 | 12.7 |
18 | gyazo.com | 188 | 1,959 | 10.4 |
19 | eu.wargaming.net | 303 | 3,041 | 10.0 |
20 | na.wargaming.net | 192 | 1,804 | 9.4 |
21 | forum.worldoftanks.com | 454 | 3,803 | 8.4 |
22 | reddit.com | 325 | 2,277 | 7.0 |
23 | youtube.com | 15,146 | 95,143 | 6.3 |
24 | strawpoll.me | 143 | 232 | 1.6 |
25 | wotreplays.eu | 495 | 79 | 0.2 |
Of the 169,737 posts in the subreddit, 12,620 were edited after being created (7.4%). The remaining 157,117 (92.6%) were not edited after creation.
It appears that the ability to award Gold to posts was added in September 2012. From September 2012 through April 2020, 112 different posts received 124 Gold awards in the /r/worldoftanks subreddit. Three gilded posts were from deleted users.
Of those 124 gilded posts, seven users had more than one gilded post:
Author | Number of Gilded Posts |
---|---|
/u/IveBeenBaguetted | 3 |
/u/TollhouseFrank | 3 |
/u/MrUltraGumby | 2 |
/u/TRU_voodoo | 2 |
/u/Penultimatum | 2 |
/u/Jozef_de_Burdi | 2 |
/u/StranaMechty | 2 |
A very small number of /r/worldoftanks Reddit posts have received more than one gold award:
A good post elicits a strong response in the form of comments. Not surprisingly, most of the posts with a high number of comments are giveaways:
Note: The current number of comments shown will not match these counts due to comment deletions that occurred after the post data was captured.
The karma numbers displayed on a post are not "real" numbers, they have been "fuzzed" to prevent spam bots from figuring out exactly how Reddit's karma system works. So, if you click on a link below, the karma score may not match (because it was a fuzzed number when captured, and it could have been upvoted/downvoted after it was captured).
These are the Redditors who have gotten the most Karma on a per post basis. They are the most efficient post karma generators–not necessarily the most prolific in their postings.
Ranking | Author | Average Karma | Post Count of author | Post Karma | Link | Domain |
---|---|---|---|---|---|---|
1 | szymon7410 | 2,771 | 1 | 2,771 | Wow the new wheeled hetzer looks amazing! Will be well balanced for sure | i.redd.it |
2 | GangTank | 2,202 | 1 | 2,202 | Tier 10 tanks hand drawing by country | i.redd.it |
3 | RaidS016 | 2,184 | 1 | 2,184 | I was not expecting that | i.redd.it |
4 | Irbis_02 | 2,154 | 1 | 2,154 | We need to be quick! | i.redd.it |
5 | MaxBattleLizard | 2,148 | 1 | 2,148 | Enjoy my homebrew meme. | i.redd.it |
6 | Roxorium | 2,102 | 1 | 2,102 | Enemy team: gains advantage and is about to win; SPG players: | v.redd.it |
7 | SuperAarukka | 1,896 | 2 | 1,943 | I came across this on the main page today | v.redd.it |
1,848 | Have we really gone this far down… | i.redd.it | ||||
8 | Fookoffyetwot | 1,818 | 1 | 1,818 | Sooo this is true | i.redd.it |
9 | PepeHandsDerp | 1,816 | 1 | 1,816 | Most toxic arty shot ever | v.redd.it |
10 | Sarhan556 | 1,764 | 1 | 1,764 | Stay safe and buy large med kit guys | i.redd.it |
If we take all of the posts of a user, then add up the posting karma on each of their posts, I can come up with a total posting karma number that can be used to rank the users to determine who has generated the largest amount of posting karma in the subreddit.
The total amount of Karma earned by all non-deleted users is 4,355,083, while deleted users generated 93,917 karma, for a grand Karma total of 4,449,000 post Karma.
Here are the top ten Karma-earning Posters in /r/worldoftanks:
Rank | Redditor | Total Post Karma |
---|---|---|
1 | /u/IveBeenBaguetted | 73,045 |
2 | /u/Alegende | 34,085 |
3 | /u/saldytuwas | 30,639 |
4 | /u/enginerdz | 22,068 |
5 | /u/Jozef_de_Burdi | 16,426 |
6 | /u/larsvdmeyde | 15,453 |
7 | /u/TChen114 | 15,283 |
8 | /u/mfumukoskoldpadda | 15,205 |
9 | /u/LEONAPROFI | 14,667 |
10 | /u/asparagustasty | 14,499 |
I calculated the average post karma and the average # of comments, over time:
Year | Average Karma | Avg # Comments |
---|---|---|
2011 | 8.2 | 11.6 |
2012 | 11.9 | 14.6 |
2013 | 13.2 | 15.6 |
2014 | 12.8 | 19.6 |
2015 | 13.8 | 20.3 |
2016 | 13.8 | 17.5 |
2017 | 17.6 | 18.1 |
2018 | 27.7 | 17.2 |
2019 | 49.7 | 15.2 |
2020 | 67.4 | 13.0 |
The total correlation between the two data sets was a low negative correlation of -0.342489168.
However, when I graphed the data, something interesting was revealed.
Average post Karma/average # of comments: Image, PDF
The data had a low positive correlation between 2011 and the start of 2017 (0.69), but then becomes negatively correlated (-0.99) as the Average post Karma increases substantially starting in 2017 and into April 2020.
I considered the fact that it may be related to the subscriber count (more eyes = more upvotes), but the subscriber count increase seemed fairly linear (unlike the average post Karma value):
Note: This subscriber data was obtained from [subredditstats](www.subredditstats.com/r/worldoftanks).
Statistics was a LONG time ago, so I'd welcome thoughts on what may be causing a higher average post karma.
It appears that Reddit removed Downvotes and Upvotes from the API in December 2016 (and thereafter just provided the combined Score value). However, here are the 10 most downvoted posts prior to that date:
In June 2015, our moderation overlords rolled out system that allowed redditors to chose a flair for their posts (e.g., Arty, Survey, Giveaway, etc.). In April 2019, they released a subreddit redesign that allowed redditors to filter post flairs, essentially allowing redditors to see what type of posts they would/wouldn't see. The moderators also have the ability to fat-finger a value and do so with comments such as, :D, Answered, Rant, Abusive Post, SirFoch Drama, etc.
To date, there have been 29,270 flaired posts, and they have been distributed as follows:
Flair | Flair Count | % of all Flairs | Cumulative % |
---|---|---|---|
Discussion | 5,632 | 19.2% | 19.2% |
Question | 5,639 | 19.3% | 38.5% |
Meme | 5,315 | 18.2% | 56.7% |
Picture | 2,441 | 8.3% | 65.0% |
Video | 18,34 | 6.3% | 71.3% |
Shitpost | 15,75 | 5.4% | 76.7% |
Post Battle Result | 1,476 | 5.0% | 81.7% |
(Fat-Fingered by Mods) | 1,336 | 5.0% | 86.3% |
News | 809 | 2.8% | 89.1% |
PSA | 602 | 2.1% | 91.1% |
Fan Made | 505 | 1.7% | 92.8% |
Arty | 488 | 1.7% | 94.5% |
Gif | 310 | 1.1% | 95.6% |
Wargaming News | 207 | 0.7% | 96.3% |
History | 207 | 0.7% | 97.0% |
Guide | 192 | 0.7% | 97.6% |
Console | 171 | 0.6% | 98.2% |
Giveaway | 157 | 0.5% | 98.8% |
Survey | 152 | 0.5% | 99.3% |
Stream | 117 | 0.4% | 99.7% |
Clan | 105 | 0.4% | 100.0% |
Totals | 29,270 | 100% |
Did you want to farm Karma here on the /r/worldoftanks subreddit? Here is how the different flaired post types have faired on an average karma basis:
Flair | Post Count with that Flair | Average Karma |
---|---|---|
Meme | 5315 | 212.5 |
Fan Made | 505 | 125.8 |
Gif | 310 | 122.2 |
History | 207 | 97.0 |
Picture | 2441 | 90.8 |
Arty | 488 | 73.3 |
Shitpost | 1575 | 69.7 |
Video | 1834 | 54.7 |
News | 809 | 34.3 |
PSA | 602 | 32.7 |
Post Battle Result | 1476 | 32.1 |
Guide | 192 | 28.0 |
Wargaming News | 207 | 22.5 |
Stream | 117 | 17.6 |
Console | 171 | 15.2 |
Giveaway | 157 | 14.6 |
Discussion | 5632 | 13.8 |
Survey | 152 | 10.9 |
Question | 5639 | 5.2 |
Clan | 105 | 4.0 |
Apparently, you shouldn't waste your time making a helpful guide. Make a meme instead.
In terms of the most active posters, as defined by the total number of posts made:
Ranking | Author | # of Posts in /r/worldoftanks |
---|---|---|
1 | /u/kurg79 | 1,583 |
2 | /u/DatBoyGuru | 308 |
3 | /u/Masauwu | 281 |
4 | /u/chort0 | 250 |
5 | /u/enginerdz | 250 |
6 | /u/Fuckanator | 247 |
7 | /u/IveBeenBaguetted | 247 |
8 | /u/try2tame | 241 |
9 | /u/TollhouseFrank | 226 |
10 | /u/garganchua | 219 |
11 | /u/StranaMechty | 213 |
12 | /u/remount_kings_troop_ | 212 |
13 | /u/_taugrim_ | 208 |
14 | /u/memyselfandlapin | 207 |
15 | /u/saldytuwas | 201 |
16 | /u/RagingRaptor177 | 200 |
17 | /u/V_Epsilon | 198 |
18 | /u/Elmalab | 186 |
19 | /u/JacquesCouSTO | 186 |
20 | /u/milkym4n | 182 |
From April 2011 through April 2020, 169,737 posts were made in the /r/worldoftanks subreddit. 32,794 unique users submitted 146,643 of those posts. The balance–23,094–were deleted posts whose authors can't be determined (that is, 13.6% or ~1/7 posts have been deleted). Those deleted posts may have been made by still-existing users, or the posts may have been made by now-deleted users.
Note: 32,794 is NOT the total number of users in the sub. See the end of this post for the true number (which includes users that have commented).
Here is the breakdown by # of users and their # of posts:
# of Posts | # of Users with that # of posts | % of users | Cumulative % |
---|---|---|---|
1 | 17,066 | 52.0% | 52.0% |
2 | 4,958 | 15.1% | 67.2% |
3 | 2,602 | 7.9% | 75.1% |
4 | 1,604 | 4.9% | 80.0% |
5 | 1,121 | 3.4% | 83.4% |
6 | 851 | 2.6% | 86.0% |
7 | 647 | 2.0% | 88.0% |
8 | 492 | 1.5% | 89.5% |
9 | 381 | 1.2% | 90.6% |
10 | 337 | 1.0% | 91.7% |
11-99 | 2736 | 8.0% | 99.7% |
>100 | 100 | 0.3% | 100.0% |
32,794 |
While Reddit does display the number of Redditors that have subscribed to the sub, we cannot tell how many of those subscribers have ever made a post (or comment). Thus, we can't determine an 'active' vs 'lurker' percentage.
Either an analysis of the 2,745,764 comments, or the 78,764 unique Reddit users that have posted or commented in the /r/worldoftanks subreddit.
I would also welcome requests and suggestions on different ways to analyze the data.
Edit: Header formats
Not being able to craft them sucks. Especially when everyone I talk to about it…
First I'd like to say I absolutely love this game it's quality. Basically I first…
Welcome to Teacher Tuesday, a thread where anyone can ask any type of question without…
I’m kind of new/returning to gwent I played beta and obviously it’s a lot lot…
Level 1 Bag (Free with Atmosphere Level 2) 6 small consumable (First Aid, Repair, Fire…
Here's my crew - T34-85M - for the life of me I cant figure out…