DWARFS

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(Added link for magic bytes, only for FormatInfo.)
(Added choice of {compressor/hash}, along with remarks and references.)
Line 26: Line 26:
 
   --metadata-compression=lzma:level=9:extreme --no-history --pack-metadata=all,force --file-hash=sha3-512 --set-owner=0 --set-group=0 --no-history-timestamp \
 
   --metadata-compression=lzma:level=9:extreme --no-history --pack-metadata=all,force --file-hash=sha3-512 --set-owner=0 --set-group=0 --no-history-timestamp \
 
   --no-create-timestamp --no-history-command-line
 
   --no-create-timestamp --no-history-command-line
 +
 +
== Choice of compression available ==
 +
As of version 0.12.4, the following compressors are available:
 +
{| border="1" cellpadding="6"
 +
| Switch
 +
| Comments
 +
|-
 +
| <code>--compression=[https://github.com/mhx/dwarfs/blob/main/src/compression/null.cpp null]</code>
 +
| No compression<ref>[https://github.com/mhx/dwarfs/blob/main/src/compression/null.cpp#L110 null.cpp (line 110) - DWARFS - GitHub]</ref>.
 +
|-
 +
| <code>--compression=[https://github.com/mhx/dwarfs/blob/main/src/compression/lzma.cpp lzma]</code>
 +
| liblzma compression<ref>[https://github.com/mhx/dwarfs/blob/main/src/compression/lzma.cpp#L413 lzma.cpp (line 413) - DWARFS - GitHub]</ref>, under modern Linux environments, [[XZ]]'s liblzma is used.
 +
|-
 +
| <code>--compression=[https://github.com/mhx/dwarfs/blob/main/src/compression/zstd.cpp zstd]</code>
 +
| [[Zstandard|libzstd]] compression<ref>[https://github.com/mhx/dwarfs/blob/main/src/compression/zstd.cpp#L179 zstd.cpp (line 179) - DWARFS - GitHub]</ref>
 +
|-
 +
| <code>--compression=[https://github.com/mhx/dwarfs/blob/main/src/compression/lz4.cpp lz4]</code>
 +
| lib[[LZ4]] compression,<ref>[https://github.com/mhx/dwarfs/blob/main/src/compression/lz4.cpp#L186 lz4.cpp (line 186) - DWARFS - GitHub]</ref> choice of either LZ4HC or LZ4 compression are available.
 +
|-
 +
| <code>--compression=[https://github.com/mhx/dwarfs/blob/main/src/compression/brotli.cpp brotli]</code>
 +
| [[Brotli]] compressor<ref>[https://github.com/mhx/dwarfs/blob/main/src/compression/brotli.cpp#L169 brotli.cpp (line 169) - DWARFS - GitHub]</ref>
 +
|-
 +
| <code>--compression=[https://github.com/mhx/dwarfs/blob/main/src/compression/flac.cpp flac]</code>
 +
| [[FLAC]] compression<ref>[https://github.com/mhx/dwarfs/blob/main/src/compression/flac.cpp#L496 flac.cpp (line 496) - DWARFS - GitHub]</ref>
 +
|-
 +
| <code>--compression=[https://github.com/mhx/dwarfs/blob/main/src/compression/ricepp.cpp ricepp]</code>
 +
| RICEPP compression, likely [[wikipedia:Golomb coding|Golomb coding]]<ref>[https://github.com/glampert/compression-algorithms/blob/master/rice.hpp#L7-L9 rice.hpp (lines 7-9) - compression-algorithms - GitHub]</ref><ref>[https://github.com/mhx/dwarfs/blob/main/src/compression/ricepp.cpp#L260 ricepp.cpp (line 260) - DWARFS - GitHub]</ref>
 +
|}
 +
 +
== Choice of hash for duplication detection available ==
 +
Since 0.7.0-RC1 introduced alternatives,<ref>[https://github.com/mhx/dwarfs/issues/92#issuecomment-1295209723 Choice of hash for duplicate detection #92 post #5 - DWARFS - GitHub]</ref> as of version [https://github.com/mhx/dwarfs/commits/main/src/checksum.cpp 0.12.4], the following hash algorithms are available:
 +
{| border="1" cellpadding="6"
 +
| Switch
 +
| Remarks
 +
|-
 +
| <code>--file-hash=none</code>
 +
| Disable file deduplication checks.<ref>[https://github.com/mhx/dwarfs/issues/92#issuecomment-1295209723 Choice of hash for duplicate detection #92, post #5 - DWARFS - GitHub]</ref>
 +
|-
 +
| <code>--file-hash=blake2s256</code>
 +
|
 +
|-
 +
| <code>--file-hash=blake2s256</code>
 +
|
 +
|-
 +
| <code>--file-hash=md5</code>
 +
|
 +
|-
 +
| <code>--file-hash=md5-sha1</code>
 +
|
 +
|-
 +
| <code>--file-hash=ripemd160</code>
 +
| Appeared as a checksum test in <code>git</code> hash <code>de5ec99</code>.<ref>[https://github.com/mhx/dwarfs/commit/de5ec99eca452d0ea2561bfcc6e9679683fa7463#diff-c4d301699de69e8106245704b87f73c0afa750915c8d396494c558aa3aaf3ab2R55 Commit de5ec99 - test/checksum_test.cpp (line 55) - test(checksum): add checksum tests - DWARFS - GitHub]</ref>
 +
|-
 +
| <code>--file-hash=sha1</code>
 +
| Formerly used hash algorithm.<ref>[https://github.com/mhx/dwarfs/issues/92#issue-1299801734 Choice of hash for duplicate detection #92, post #1 - DWARFS - GitHub]</ref>
 +
|-
 +
| <code>--file-hash=sha224</code>
 +
|
 +
|-
 +
| <code>--file-hash=sha256</code>
 +
|
 +
|-
 +
| <code>--file-hash=sha384</code>
 +
|
 +
|-
 +
| <code>--file-hash=sha512</code>
 +
|
 +
|-
 +
| <code>--file-hash=sha3-224</code>
 +
|
 +
|-
 +
| <code>--file-hash=sha3-256</code>
 +
|
 +
|-
 +
| <code>--file-hash=sha3-384</code>
 +
|
 +
|-
 +
| <code>--file-hash=sha3-512</code>
 +
|
 +
|-
 +
| <code>--file-hash=sha512-224</code>
 +
|
 +
|-
 +
| <code>--file-hash=sha512-256</code>
 +
|
 +
|-
 +
| <code>--file-hash=shake128</code>
 +
| Disabled by author in <code>git</code> hash <code>afbd85e</code>.<ref>[https://github.com/mhx/dwarfs/commit/afbd85e4b1aad52b651fd42dc182e8109d0fa8b1 Commit afbd85e - fix(checksum): disable extended output algorithms (e.g. shake(128|256)) - DWARFS - GitHub]</ref>
 +
|-
 +
| <code>--file-hash=shake256</code>
 +
| Disabled by author in <code>git</code> hash <code>afbd85e</code>.<ref>[https://github.com/mhx/dwarfs/commit/afbd85e4b1aad52b651fd42dc182e8109d0fa8b1 Commit afbd85e - fix(checksum): disable extended output algorithms (e.g. shake(128|256)) - DWARFS - GitHub]</ref>
 +
|-
 +
| <code>--file-hash=sm3</code>
 +
|
 +
|-
 +
| <code>--file-hash=xxh3-64</code>
 +
| Added by author in <code>git</code> hash <code>7ded26d</code>.<ref>[https://github.com/mhx/dwarfs/commit/7ded26d6a398af10fb05957b2fb4891cc533600c Commit 7ded26d - feat(checksum): add hexdigest() method - DWARFS - GitHub]</ref>
 +
|-
 +
| <code>--file-hash=xxh3-128</code>
 +
| Current default choice when not specified.<ref>[https://github.com/mhx/dwarfs/releases/tag/v0.7.0-RC1 v0.7.0-RC1 - DWARFS releases by tag - GitHub]</ref>
 +
|}
  
 
== Links ==
 
== Links ==

Revision as of 09:28, 17 August 2025

File Format
Name DWARFS
Ontology
Compression lossless, optional
Magic Bytes 44 57 41 52 46 53
Released 2020 [1]

Deduplicating Warp-speed Advanced Read-only File System (DWARFS) is a read-only filesystem that optionally supports no compression (via --compress-level=0 or --compression=null for instance) and no deduplication (via --file-hash none. It is developed by Marcus Holland-Moritz and when compared to Squashfs, DWARFS also offers a choice of hashing algorithms,[2] as well as a tool for checking DWARFS image via dwarfsck. At maximum compression levels using LZMA, DWARFS (using --compression=lzma:level=9:extreme) can produce smaller file size compared to SquashFS with the rough equivalent of using XZ as -comp xz.

See also Squashfs.

Contents

Discussion

The aim with this software project is to ultimately create a compressed, deduplicated, read-only file system. While it is possible to disable all except for read-only, disabling it defeats the purpose.

DWARFS software project also intends to compete (performance-wise) with SquashFS at file system creation, in that a given directory, folder, or path is scanned, hashed, (optionally categorized) before the contents are compressed, adding only the unique copies. SquashFS tends to add and hash files as it creates the file system, similar to how 7-Zip or ZIP files for instance are created, with the apparent ability to detect and avoid adding duplicate files during creation process that is unique to the likes of SquashFS and DWARFS, which however is considered slower, generally double-handling compared to DWARFS which does these during the scanning phase.

The developer's original motivation was with, "several hundred different versions of Perl that were taking up something around 30 gigabytes of disk space" and that there, "was unwilling to spend more than 10% of my hard drive keeping them around for when I happened to need them."[3] This may put the project in line for competing with git, which is a Versioning Control System (VCS), that is used to record changes to a given project at a more atomic level, compressing (and using zlib) changes with each commit, however, unlike git which introduces rather steep learning curves, contrasting to just adding several versions of the same software project into an "archive" is arguably more trivial to do. Ultimately, DWARFS, like SquashFS has many other potential use-cases, rather than whatever they were intentionally designed for, making it a somewhat popular choice in the race against other forms of archivers and software/file distribution methods, without requiring the use of bespoke compression software that happens to be ranked among the top in Matt Mahoney's data compression benchmarks.

Identification

DWARFS files begins with the hexadecimal 44 57 41 52 46 53 which translates to "DWARFS" in ASCII.[4]

Examples

Create an extremely compressed DwarFS image, without history, all root owned contents, sha3-512 hash algorithm, idle task (lowest priority) from present directory to example.dwarfs:

 mkdwarfs --input . --output=example.dwarfs --block-size-bits=26 --compression=lzma:level=9:extreme --compress-niceness=10 --schema-compression=lzma:level=9:extreme \
 --metadata-compression=lzma:level=9:extreme --no-history --pack-metadata=all,force --file-hash=sha3-512 --set-owner=0 --set-group=0 --no-history-timestamp \
 --no-create-timestamp --no-history-command-line

Choice of compression available

As of version 0.12.4, the following compressors are available:

Switch Comments
--compression=null No compression[5].
--compression=lzma liblzma compression[6], under modern Linux environments, XZ's liblzma is used.
--compression=zstd libzstd compression[7]
--compression=lz4 libLZ4 compression,[8] choice of either LZ4HC or LZ4 compression are available.
--compression=brotli Brotli compressor[9]
--compression=flac FLAC compression[10]
--compression=ricepp RICEPP compression, likely Golomb coding[11][12]

Choice of hash for duplication detection available

Since 0.7.0-RC1 introduced alternatives,[13] as of version 0.12.4, the following hash algorithms are available:

Switch Remarks
--file-hash=none Disable file deduplication checks.[14]
--file-hash=blake2s256
--file-hash=blake2s256
--file-hash=md5
--file-hash=md5-sha1
--file-hash=ripemd160 Appeared as a checksum test in git hash de5ec99.[15]
--file-hash=sha1 Formerly used hash algorithm.[16]
--file-hash=sha224
--file-hash=sha256
--file-hash=sha384
--file-hash=sha512
--file-hash=sha3-224
--file-hash=sha3-256
--file-hash=sha3-384
--file-hash=sha3-512
--file-hash=sha512-224
--file-hash=sha512-256
--file-hash=shake128 Disabled by author in git hash afbd85e.[17]
--file-hash=shake256 Disabled by author in git hash afbd85e.[18]
--file-hash=sm3
--file-hash=xxh3-64 Added by author in git hash 7ded26d.[19]
--file-hash=xxh3-128 Current default choice when not specified.[20]

Links

References

  1. Release 0.1.0 - GitHub
  2. Choice of hash for duplicate detection #92 - GitHub
  3. History section - DWARFS - GitHub
  4. 0000449: Add magic for the DWARFS compressed file system format - bugs.astron.com
  5. null.cpp (line 110) - DWARFS - GitHub
  6. lzma.cpp (line 413) - DWARFS - GitHub
  7. zstd.cpp (line 179) - DWARFS - GitHub
  8. lz4.cpp (line 186) - DWARFS - GitHub
  9. brotli.cpp (line 169) - DWARFS - GitHub
  10. flac.cpp (line 496) - DWARFS - GitHub
  11. rice.hpp (lines 7-9) - compression-algorithms - GitHub
  12. ricepp.cpp (line 260) - DWARFS - GitHub
  13. Choice of hash for duplicate detection #92 post #5 - DWARFS - GitHub
  14. Choice of hash for duplicate detection #92, post #5 - DWARFS - GitHub
  15. Commit de5ec99 - test/checksum_test.cpp (line 55) - test(checksum): add checksum tests - DWARFS - GitHub
  16. Choice of hash for duplicate detection #92, post #1 - DWARFS - GitHub
  17. Commit afbd85e - fix(checksum): disable extended output algorithms (e.g. shake(128|256)) - DWARFS - GitHub
  18. Commit afbd85e - fix(checksum): disable extended output algorithms (e.g. shake(128|256)) - DWARFS - GitHub
  19. Commit 7ded26d - feat(checksum): add hexdigest() method - DWARFS - GitHub
  20. v0.7.0-RC1 - DWARFS releases by tag - GitHub
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox