Web archiving

Websites are momentary. Apps disappear, services shut down and while new ones replace them, the content that once was available at a certain URL is lost. I think personal web archiving is the only long-term solution.


For my personal toolkit, I'm trying to find and use already existing tools to reduce the required maintenance. It's mainly based on monolith, therefore all assets are embedded into single, self-contained HTML documents. This makes the files bigger, but so far, self-contained files worked better. I tend to also archive external resources, Are.na channels, videos, mentioned articles. Sometimes I fall back to wget though.

Next to a few helper scripts, I use the following tools:

  • natto - to crawl the site
  • monolith - to save pages
  • wget - to save pages
  • you-get - to save media
  • gzip - to compress