I made a comment about how there’s such a wealth of knowledge that was available on Reddit that makes it so useful and whilst the cached pages of Google and Waybackmachine (though I’ve found it doesn’t have a copy of a lot of pages I want to view), I have some fear of these disappearing eventually along with people going back and scrubbing their old comments and posts in an effort to remove their content from Reddit and I suppose devaluing the platform as the information stored is pretty useful.
I came across this dump of Reddit submissions and comments from 2005-2022 for the top 20K subs: https://academictorrents.com/details/c398a571976c78d346c325bd75c47b82edf6124e
It says it’s about 1.66TB. I haven’t downloaded it to have a look at it because I have no space (lol) but I plan to to hopefully preserve and make use of it. When I have time I might write something to index the data so I can search it for what I need.
Just thought I’d share the dump anyway for anyone with similar concerns.