Subreddit Data Analysis – Part One

Subreddit Data Analysis

Summary of Project

Captured all ~169k Posts and 2.7 million comments from the /r/worldoftanks subreddit from April 2011 through April 2020.

Analysis Releases

The subreddit data analysis will be broken into three separate postings which will cover:

  • Posts

  • Comments

  • Users

This particular post will cover the Posts submitted to the /r/worldoftanks subreddit.

However, before I get into those details, I will first share how the data was obtained and processed.

Data Source

I owe a huge debt of thanks to Jason Baumgartner (/u/Stuck_In_the_Matrix), who runs the pushshift.io website, which is a big-data storage and analytics project, and who moderates the /r/pushshift subreddit.

Essentially, he makes a copy of all Reddit posts and comments by querying Reddit's API, and makes that data available for download. He is the proverbial genius whose shoulders I have stood upon, and his colossal efforts make my modest data project possible. As a result, I support him on Patreon.

Data Privacy

I understand that many Redditors have privacy concerns about their data being stored/viewable on the internet, and some have taken active steps to remove their data (comments/posts) after a period of time. The /r/pushshift subreddit does have a thread where you can request that your data is deleted/no longer saved in the future.

Note: Even if you delete your comments and posts periodically here on Reddit via a script/bot, your data will still be captured by pushshift because it captures the data almost as soon as it is created.

You may also DM me, and I'll delete your data from my data set too.

The Data

The Reddit data is stored here: http://files.pushshift.io/reddit/

The monthly submission files contain all posts made in ALL the 2.2 million reddit sub-reddits since Reddit's inception in January 2011, and through April 2020. The monthly subreddit files ranged in size from 0.15 MB to 9.1 GB.

New months are added on no particular schedule.

The files are compressed in the bz2, xz, or zst format, and I used Peazip (for zst) or Winzip (for bz2 and xz) to unzip the JSON data (and yes, I actually bought Winzip).

Once decompressed, I needed to separate out the /r/worldoftanks data from the non-worldoftanks data. This is no small task when an 8.9 GB compressed file becomes a 97.6 GB uncompressed text file. Fortunately, I found a pretty amazing text editor named EmEditor, which can open very large text files. It was also able to split the larger files into more 'manageable' 10 GB chunks that my laptop could process without choking (Note: Once I got my new PC with 32 GB of RAM, I no longer had to split the files). I was able to use EmEditor's Find capability to search the files and Extract each line in the file that contained the /r/worldoftanks subreddit identifier (t5_2s113). For example, the March 2018 Reddit Submissions file contained 96,490,262 records, and only 1,856 (0.0019%) were from the /r/worldoftanks subreddit.

Once extracted, I had to convert the JSON-formatted records into Excel. To do this, I used www.json-csv.com, which charges $10 per month if you want to convert over 1 MB/day.

Notes: You'll need Excel 2007 or greater to get past the 65,536 row limit that exists in previous versions.

Results – Posts

Distribution of Posts

In terms of frequency, here are the number of posts per year in the /r/worldoftanks subreddit:

Year/r/worldoftanks posts
2011579
20127,378
201319,875
201417,904
201517,536
201619,803
201723,854
201821,044
201926,525
202015,239*
Total169,737

*Note: The 2020 number only contains records through April 2020.

 

Number of Posts Over Time

Here is a graph showing the number of /r/worldoftanks posts over time:

Image / PDF

Number of Posts Over Time – Analysis

/r/worldoftanks posting activity seems to be correlated with the academic schedules followed in the US and Europe.

For example, in 2013, 2014, 2015, and 2018, you can see a general increase in the late-spring/summer months (when US students are on summer break), and then a decrease starting in September thru November (corresponding to the US Fall semester which runs mid-August through December), and then a peak in December when US students are on Christmas Break. 2016 and 2017 don't show this trend. 2019 has a spike in May and June, but the general trend is an increase from April to August.

Interestingly, during the US Spring semester (which runs mid-January through the end of April), posting activity increases from Jan through April in most years.

EU-country's academic schedules are much more varied than the US, but my research indicates that many EU Fall school breaks run from mid-Dec – early-January, and EU Spring breaks runs from late March to mid-April.

The observed data supports an EU fall break in December, and the EU Spring break may help to explain the increases frequently seen in March, and the frequent peaks seen in the month of April.

Types of Posts

There are two types of posts: text posts, where you write the content in a text box, and link posts, where you provide a link to another website, image, video, etc.

Of the 169,737 posts in the /r/worldoftanks subreddit, 92,803 were self posts (54.7%), and 76,934 were link posts (45.3%). This designation is maintained even for deleted posts.

Link Posts

Reddit records the domain for each linked post. Of the 76,934 linked posts, the post count of the 25 most-linked-to domains totaled to 69,983 records, or 91.0% of all linked posts.

RankingDomainPostsCumulative # of Posts% of Total Linked PostsCumulative %
1imgur.com21,49421,49427.9%27.9%
2i.redd.it/i.reddituploads.com (images)18,71940,21324.3%52.3%
3youtube.com15,14655,35919.7%72.0%
4gfycat.com300358,3623.9%75.9%
5v.redd.it (video saved to reddit)226960,6312.9%78.8%
6worldoftanks.com174062,3712.3%81.1%
7twitch.tv150863,8792.0%83.0%
8worldoftanks.eu131865,1971.7%84.7%
9ftr.wot-news.com64665,8430.8%85.6%
10wotreplays.eu49566,3380.6%86.2%
11forum.worldoftanks.com45466,7920.6%86.8%
12ritastatusreport.live (and earlier versions)42467,2160.6%87.4%
13reddit.com32567,5410.4%87.8%
14eu.wargaming.net30367,8440.4%88.2%
15thearmoredpatrol.com27268,1160.4%88.5%
16streamable.com25868,3740.3%88.9%
17thedailybounce.net23668,6100.3%89.2%
18forum.worldoftanks.eu22268,8320.3%89.5%
19worldoftanks.asia20069,0320.3%89.7%
20na.wargaming.net19269,2240.2%90.0%
21gyazo.com18869,4120.2%90.2%
22puu.sh18069,5920.2%90.5%
23strawpoll.me14369,7350.2%90.6%
24facebook.com12769,8620.2%90.8%
25twitter.com12169,9830.2%91.0%
>25(other domains)6,95176,9349.0%100.0%
Total / % of Total76,934100.0%

Average Post Karma by Domain

On a domain basis, I calculated the sum of the karma for all posts for that domain, then divided by the number of posts for that domain, in order to calculate the average karma for posts from that domain.

RankingDomain# of Posts by DomainTotal KarmaAverage Karma
1i.redd.it/i.reddituploads.com (images)18,7192,136,606114.1
2v.redd.it (video saved to reddit)2269251,118110.7
3gfycat.com3,003218,11672.6
4imgur.com21,494747,04834.8
5streamable.com2588,71033.8
6puu.sh1805,98433.2
7thedailybounce.net2366,32626.8
8ftr.wot-news.com64615,16823.5
9ritastatusreport.live (and earlier versions)4248,85520.9
10twitch.tv150830,70520.4
11thearmoredpatrol.com2725,00418.4
12forum.worldoftanks.eu2223,86617.4
13worldoftanks.com1,74026,67015.3
14worldoftanks.eu1,31818,89714.3
15twitter.com1211,65113.6
16facebook.com1271,71413.5
17worldoftanks.asia2002,54712.7
18gyazo.com1881,95910.4
19eu.wargaming.net3033,04110.0
20na.wargaming.net1921,8049.4
21forum.worldoftanks.com4543,8038.4
22reddit.com3252,2777.0
23youtube.com15,14695,1436.3
24strawpoll.me1432321.6
25wotreplays.eu495790.2

Edited Posts

Of the 169,737 posts in the subreddit, 12,620 were edited after being created (7.4%). The remaining 157,117 (92.6%) were not edited after creation.

Gilded Posts

It appears that the ability to award Gold to posts was added in September 2012. From September 2012 through April 2020, 112 different posts received 124 Gold awards in the /r/worldoftanks subreddit. Three gilded posts were from deleted users.

Users who have had more than one gold post

Of those 124 gilded posts, seven users had more than one gilded post:

AuthorNumber of Gilded Posts
/u/IveBeenBaguetted3
/u/TollhouseFrank3
/u/MrUltraGumby2
/u/TRU_voodoo2
/u/Penultimatum2
/u/Jozef_de_Burdi2
/u/StranaMechty2

Posts with more than one Gold

A very small number of /r/worldoftanks Reddit posts have received more than one gold award:

Reddit UserPostNumber of Gold Awards on Post
/u/Penultimatum [NA] Scavenger Hunt Info and Codes for May (with link to official WG post and frequent updates) 4
/u/twofingersofredrum This is going to take a while to grind 3
/u/MrUltraGumby My Side of the Wargaming America Visit with Victor 2
/u/assassinator Clearing up some things. 2
/u/Wakka_bot Project Poverty – a Free-to-play experiment – Part 0+1 2
/u/Penultimatum [NA] Scavenger Hunt Info and Codes for January and February (with link to official WG post and frequent updates) 2

Posts with the Most Comments

A good post elicits a strong response in the form of comments. Not surprisingly, most of the posts with a high number of comments are giveaways:

Note: The current number of comments shown will not match these counts due to comment deletions that occurred after the post data was captured.

RankingAuthorPost Title/Link# of Comments
1/u/Canteen_CABlack Market Megathread1,745
2/u/sheepcat87[Giveaway] $108 worth of stuff via Intel bundle1,342
3/u/Canteen_CAChristmas Lootboxes Megathread932
4/u/Ectar_We are Wargaming! AMA about update 9.0!870
5/u/TollhouseFrankPrivate "Santa Clause" Public's Christmas Give-away813
6/u/Hypnotik-WG[NA] Tank Hollow Contest812
7/u/Bonkeyz[Giveaway] $108 worth of in-game content762
8/u/shiftyjamoGiving away a code for the NA server713
9/u/Methos_Your Unpopular World of Tanks Opinions711
10/u/MPortsFree World of Tanks Currency Giveaway!663
11/u/DamienJaxxNo more RDDT social clans657
12/u/brawnhoefferWhat has happened, RDDT?631
13/u/jpheroBonus Code Raffle!620
14/u/DeutschbaggerShare your Loot Box stories here!616
15/u/ahenkelI bought an Intel CPU it has World of Tank codes. I don't play WOT. I'm giving them away.615
16/u/AxertinWhy all the extreme artillery hate?606
17/u/Remount_Kings_Troop_Remount's Yule Log Contest – Win 25,000 gold590
18/u/ArcaniusuFree wargaming code588
19/u/Laera_wgHi, We’re Wargaming America – Ask Us Anything!579
20/u/MiiLee94[Giveaway][NA only] World Of Tanks Expert Pack574

Posts with Highest Karma Scores

The karma numbers displayed on a post are not "real" numbers, they have been "fuzzed" to prevent spam bots from figuring out exactly how Reddit's karma system works. So, if you click on a link below, the karma score may not match (because it was a fuzzed number when captured, and it could have been upvoted/downvoted after it was captured).

Reddit UserPostHighest Karma Scores
/u/mitsakos23Sometimes i miss the "All" chat..3,378
/u/UolakJapanese TD models leaked3,074
/u/UnnecessaryAmmoRackWish allied arta couldn't stun teammates2,983
/u/Scro11LockTotally not a cakeday bait2,966
/u/szymon7410Wow the new wheeled hetzer looks amazing! Will be well balanced for sure2,771
/u/infuriateslothI’m gonna miss all chat2,725
/u/strobikaEvery god damn time…2,684
/u/DD-AminWoT playerbase, 2020, colourised.2,636
/u/rickyfort5British and American heavy tanks, explained by biology2,629
/u/ZenzetaaIn the end it's the result that matters, right?2,560

Redditors with Highest Average Karma Scores

These are the Redditors who have gotten the most Karma on a per post basis. They are the most efficient post karma generators–not necessarily the most prolific in their postings.

RankingAuthorAverage KarmaPost Count of authorPost KarmaLinkDomain
1szymon74102,77112,771Wow the new wheeled hetzer looks amazing! Will be well balanced for surei.redd.it
2GangTank2,20212,202Tier 10 tanks hand drawing by countryi.redd.it
3RaidS0162,18412,184I was not expecting thati.redd.it
4Irbis_022,15412,154We need to be quick!i.redd.it
5MaxBattleLizard2,14812,148Enjoy my homebrew meme.i.redd.it
6Roxorium2,10212,102Enemy team: gains advantage and is about to win; SPG players:v.redd.it
7SuperAarukka1,89621,943I came across this on the main page todayv.redd.it
1,848Have we really gone this far down…i.redd.it
8Fookoffyetwot1,81811,818Sooo this is truei.redd.it
9PepeHandsDerp1,81611,816Most toxic arty shot everv.redd.it
10Sarhan5561,76411,764Stay safe and buy a large med kit guysi.redd.it

Karma by Posters

If we take all of the posts of a user, then add up the posting karma on each of their posts, I can come up with a total posting karma number that can be used to rank the users to determine who has generated the largest amount of posting karma in the subreddit.

The total amount of Karma earned by all non-deleted users is 4,355,083, while deleted users generated 93,917 karma, for a grand Karma total of 4,449,000 post Karma.

Here are the top ten Karma-earning Posters in /r/worldoftanks:

RankRedditorTotal Post Karma
1/u/IveBeenBaguetted73,045
2/u/Alegende34,085
3/u/saldytuwas30,639
4/u/enginerdz22,068
5/u/Jozef_de_Burdi16,426
6/u/larsvdmeyde15,453
7/u/TChen11415,283
8/u/mfumukoskoldpadda15,205
9/u/LEONAPROFI14,667
10/u/asparagustasty14,499

Average Post Karma vs Average # of Post Comments

I calculated the average post karma and the average # of comments, over time:

YearAverage KarmaAvg # Comments
20118.211.6
201211.914.6
201313.215.6
201412.819.6
201513.820.3
201613.817.5
201717.618.1
201827.717.2
201949.715.2
202067.413.0

The total correlation between the two data sets was a low negative correlation of -0.342489168.

 

However, when I graphed the data, something interesting was revealed.

Average post Karma/average # of comments: Image, PDF

 

The data had a low positive correlation between 2011 and the start of 2017 (0.69), but then becomes negatively correlated (-0.99) as the Average post Karma increases substantially starting in 2017 and into April 2020.

I considered the fact that it may be related to the subscriber count (more eyes = more upvotes), but the subscriber count increase seemed fairly linear (unlike the average post Karma value):

Subscriber stats: Image, PDF

Note: This subscriber data was obtained from [subredditstats](www.subredditstats.com/r/worldoftanks).

Statistics was a LONG time ago, so I'd welcome thoughts on what may be causing a higher average post karma.

Downvoted Posts

It appears that Reddit removed Downvotes and Upvotes from the API in December 2016 (and thereafter just provided the combined Score value). However, here are the 10 most downvoted posts prior to that date:

Reddit UserPostNumber of Downvotes
/u/alphahomersimpsonFormal apology to RDDT members.47
/u/teser1How I feel every time I see a SS of the victory screen on /r/worldoftanks45
[deleted]100 upvotes and i give this paysafecard to a lucky man37
/u/DvinciPlaying a French tank is like raping someone.36
/u/Westy543A friend of mine started tonight; for the last battle of the night, he was pulled into a tier 10 battle for laughs. He found a Maus.32
/u/AlertAtheistMy keys made an interesting shadow last night,i've been playing too much WoT to see this…32
/u/antricferKarma machine in a TD.31
/u/flyingbird0026How I see Russian mediums as an E-100 player.28
[deleted]reach for water. grab fan.27
/u/hotibombaInventor of this awkward map27

Post flairs

In June 2015, our moderation overlords rolled out a system that allowed redditors to chose a flair for their posts (e.g., Arty, Survey, Giveaway, etc.). In April 2019, they released a subreddit redesign that allowed redditors to filter post flairs, essentially allowing redditors to see what type of posts they would/wouldn't see. The moderators also have the ability to fat-finger a value and do so with comments such as, :D, Answered, Rant, Abusive Post, SirFoch Drama, etc.

To date, there have been 29,270 flaired posts, and they have been distributed as follows:

FlairFlair Count% of all FlairsCumulative %
Discussion5,63219.2%19.2%
Question5,63919.3%38.5%
Meme5,31518.2%56.7%
Picture2,4418.3%65.0%
Video18,346.3%71.3%
Shitpost15,755.4%76.7%
Post Battle Result1,4765.0%81.7%
(Fat-Fingered by Mods)1,3365.0%86.3%
News8092.8%89.1%
PSA6022.1%91.1%
Fan Made5051.7%92.8%
Arty4881.7%94.5%
Gif3101.1%95.6%
Wargaming News2070.7%96.3%
History2070.7%97.0%
Guide1920.7%97.6%
Console1710.6%98.2%
Giveaway1570.5%98.8%
Survey1520.5%99.3%
Stream1170.4%99.7%
Clan1050.4%100.0%
Totals29,270100%

Post Flair Karma

Did you want to farm Karma here on the /r/worldoftanks subreddit? Here is how the different flaired post types have faired on an average karma basis:

FlairPost Count with that FlairAverage Karma
Meme5315212.5
Fan Made505125.8
Gif310122.2
History20797.0
Picture244190.8
Arty48873.3
Shitpost157569.7
Video183454.7
News80934.3
PSA60232.7
Post Battle Result147632.1
Guide19228.0
Wargaming News20722.5
Stream11717.6
Console17115.2
Giveaway15714.6
Discussion563213.8
Survey15210.9
Question56395.2
Clan1054.0

Apparently, you shouldn't waste your time making a helpful guide. Make a meme instead.

Most active posters

In terms of the most active posters, as defined by the total number of posts made:

RankingAuthor# of Posts in /r/worldoftanks
1/u/kurg791,583
2/u/DatBoyGuru308
3/u/Masauwu281
4/u/chort0250
5/u/enginerdz250
6/u/Fuckanator247
7/u/IveBeenBaguetted247
8/u/try2tame241
9/u/TollhouseFrank226
10/u/garganchua219
11/u/StranaMechty213
12/u/remount_kings_troop_212
13/u/_taugrim_208
14/u/memyselfandlapin207
15/u/saldytuwas201
16/u/RagingRaptor177200
17/u/V_Epsilon198
18/u/Elmalab186
19/u/JacquesCouSTO186
20/u/milkym4n182

Count of Users with the Same Number of Posts

From April 2011 through April 2020, 169,737 posts were made in the /r/worldoftanks subreddit. 32,794 unique users submitted 146,643 of those posts. The balance–23,094–were deleted posts whose authors can't be determined (that is, 13.6% or ~1/7 posts have been deleted). Those deleted posts may have been made by still-existing users, or the posts may have been made by now-deleted users.

Note: 32,794 is NOT the total number of users in the sub. See the end of this post for the true number (which includes users that have commented).

Here is the breakdown by # of users and their # of posts:

# of Posts# of Users with that # of posts% of usersCumulative %
117,06652.0%52.0%
24,95815.1%67.2%
32,6027.9%75.1%
41,6044.9%80.0%
51,1213.4%83.4%
68512.6%86.0%
76472.0%88.0%
84921.5%89.5%
93811.2%90.6%
103371.0%91.7%
11-9927368.0%99.7%
>1001000.3%100.0%
32,794

While Reddit does display the number of Redditors that have subscribed to the sub, we cannot tell how many of those subscribers have ever made a post (or comment). Thus, we can't determine an 'active' vs 'lurker' percentage.

Next Steps

Either an analysis of the 2,745,764 comments, or the 78,764 unique Reddit users that have posted or commented in the /r/worldoftanks subreddit.

I would also welcome requests and suggestions on different ways to analyze the data.

Edit: Header formats

Source: https://www.reddit.com/r/WorldofTanks/comments/jva629/subreddit_data_analysis_part_one/

leave a comment

Your email address will not be published. Required fields are marked *