An open source self-hosted web archiving solution for creating local, browsable copies of websites and collecting media assets.
ArchiveBox is an open source self-hosted web archiving solution designed to allow anyone to easily collect and archive content from the internet to create their own personal web archive.
It works by allowing users to submit URLs which ArchiveBox will then fetch, extract assets from, render snapshots of, and archive the resulting data. The archived content can include the original HTML, PNG/JPEG snapshots, assets like JS/CSS/images, extracted text/hyperlinks, bookmarks, metadata, and more.
Once archived, all the information for a given site is organized neatly into a folder which contains plain TXT/HTML renders of the page content, all extracted assets, a screenshot, metadata like headers/cookies/ etc., and an index file with metadata and bookmarks. This format makes it easy to view your archive offline while retaining lots of the original context.
ArchiveBox focuses on being easy to self-host for individuals and aims to be a one-click install with easy backups/exports while still offering configurability and a comprehensive feature set. It was designed with long-term preservation in mind with readability and standards compliance in mind over storing full interactive sites.
Here are some alternatives to ArchiveBox:
Suggest an alternative ❐