Does somebody just need to buy a lot of hard drives and data tapes, and program a bunch of raspberry pi to download everything it can find?

Edit: What I’m specifically asking about is the feature reddit had to search the site itself. Obviously for reddit this process is much simpler, since they’re just searching their own database.