Let's create an archive utility!

File formats.zip, .7z, .warc, .gz, bzip2 § Format
Compression codecsDEFLATE, LZMA, Brotli, bzip2 § Codec
Hashes and checksumsSHA-256, CRC-32
HTTPParsing HTTP, HTTP redirects, Content-Encoding, Transfer-Encoding § chunked
Data economicsIndexes, Reducing seeks, Replication, Storage reliability

Archive files contain items. For example, you might have:

Sometimes there is no compression involved, and each item simply exists as a substring of the archive file; i.e. the archive file is simply the concatenation of item content along with fragments of metadata.

TODO: example diagram, with more details as you hover/select parts of it

E.g. you can read foo.png's bytes simply by reading TODO bytes from offset TODO of that archive file.

Sometimes each item in an archive is compressed individually:

TODO: example diagram, with more details as you hover/select parts of it

Sometimes the archive file does not involve compression, but then the entire archive file gets compressed (e.g. .zip.bz2 or .warc.gz). This can lead to a smaller file, especially if the items in the archive have lots of substrings in common.

Let's create some tooling to help with:

Prepare:

Implement: