DWARFS
(Added choice of {compressor/hash}, along with remarks and references.) |
m (Typo fixes for SHA-1 referenced link, concerning the name.) |
||
(13 intermediate revisions by one user not shown) | |||
Line 5: | Line 5: | ||
|compression=lossless, optional | |compression=lossless, optional | ||
|magic=[https://github.com/file/file/blob/master/magic/Magdir/dwarfs#L27 44 57 41 52 46 53] | |magic=[https://github.com/file/file/blob/master/magic/Magdir/dwarfs#L27 44 57 41 52 46 53] | ||
+ | |developed by=Marcus Holland-Moritz | ||
|released=2020 <ref>[https://github.com/mhx/dwarfs/releases/tag/release-0.1.0 Release 0.1.0 - GitHub]</ref> | |released=2020 <ref>[https://github.com/mhx/dwarfs/releases/tag/release-0.1.0 Release 0.1.0 - GitHub]</ref> | ||
}} | }} | ||
− | '''D'''eduplicating '''W'''arp-speed '''A'''dvanced '''R'''ead-only '''F'''ile '''S'''ystem (DWARFS) is a read-only filesystem that optionally supports no compression (via <code>--compress-level=0</code> or <code>--compression=null</code> for instance) and no deduplication (via <code>--file-hash none</code>. It is developed by Marcus Holland-Moritz and when compared to Squashfs, DWARFS also offers a choice of hashing algorithms,<ref>[https://github.com/mhx/dwarfs/issues/92 Choice of hash for duplicate detection #92 - GitHub]</ref> as well as a tool for checking DWARFS image via <code>dwarfsck</code>. At maximum compression levels using LZMA, DWARFS (using <code>--compression=lzma:level=9:extreme</code>) can produce smaller file size compared to SquashFS with the rough equivalent of using XZ as <code>-comp xz</code>. | + | '''D'''eduplicating '''W'''arp-speed '''A'''dvanced '''R'''ead-only '''F'''ile '''S'''ystem (DWARFS) is a read-only filesystem that optionally supports no compression (via <code>--compress-level=0</code> or <code>--compression=null</code> for instance) and no deduplication (via <code>--file-hash none</code>). |
+ | |||
+ | It is developed by Marcus Holland-Moritz and when compared to Squashfs, DWARFS also offers a choice of hashing algorithms,<ref>[https://github.com/mhx/dwarfs/issues/92 Choice of hash for duplicate detection #92 - GitHub]</ref> as well as a tool for checking DWARFS image via <code>dwarfsck</code>. | ||
+ | |||
+ | At maximum compression levels using LZMA, DWARFS (using <code>--compression=lzma:level=9:extreme</code>) can produce smaller file size compared to SquashFS with the rough equivalent of using XZ as <code>-comp xz</code>. | ||
+ | |||
+ | The software is currently available for both Unix-like (generally [[Linux]]) and [[Microsoft Windows]], with the caveat that, "Support for the Windows operating system is currently experimental. Having worked pretty much exclusively in a Unix world for the past two decades, my experience with Windows development is rather limited and I'd expect there to definitely be bugs and rough edges in the Windows code."<ref>[https://github.com/mhx/dwarfs?tab=readme-ov-file#windows-support Windows Support section of README.md - DWARFS - GitHub]</ref> There is even [https://formulae.brew.sh/formula/dwarfs <code>homebrew</code> formulae] for [[macOS]]. | ||
See also [[Squashfs]]. | See also [[Squashfs]]. | ||
Line 22: | Line 29: | ||
== Examples == | == Examples == | ||
− | Create an extremely compressed | + | Create an extremely compressed DWARFS image, without history, all root owned contents, sha3-512 hash algorithm, idle task (lowest priority) from present directory to <code>example.dwarfs</code>: |
mkdwarfs --input . --output=example.dwarfs --block-size-bits=26 --compression=lzma:level=9:extreme --compress-niceness=10 --schema-compression=lzma:level=9:extreme \ | mkdwarfs --input . --output=example.dwarfs --block-size-bits=26 --compression=lzma:level=9:extreme --compress-niceness=10 --schema-compression=lzma:level=9:extreme \ | ||
--metadata-compression=lzma:level=9:extreme --no-history --pack-metadata=all,force --file-hash=sha3-512 --set-owner=0 --set-group=0 --no-history-timestamp \ | --metadata-compression=lzma:level=9:extreme --no-history --pack-metadata=all,force --file-hash=sha3-512 --set-owner=0 --set-group=0 --no-history-timestamp \ | ||
--no-create-timestamp --no-history-command-line | --no-create-timestamp --no-history-command-line | ||
− | == Choice of | + | == <code>mkdwarfs</code> only features == |
+ | The following below illustrates features that are only available upon creating the DWARFS file. Once the DWARFS file has been created, the only way to change these is to use <code>--recompress=all</code>. | ||
+ | |||
+ | === Choice of available compression=== | ||
As of version 0.12.4, the following compressors are available: | As of version 0.12.4, the following compressors are available: | ||
{| border="1" cellpadding="6" | {| border="1" cellpadding="6" | ||
Line 34: | Line 44: | ||
|- | |- | ||
| <code>--compression=[https://github.com/mhx/dwarfs/blob/main/src/compression/null.cpp null]</code> | | <code>--compression=[https://github.com/mhx/dwarfs/blob/main/src/compression/null.cpp null]</code> | ||
− | | No compression<ref>[https://github.com/mhx/dwarfs/blob/main/src/compression/null.cpp#L110 null.cpp (line 110) - DWARFS - GitHub]</ref> | + | | No compression.<ref>[https://github.com/mhx/dwarfs/blob/main/src/compression/null.cpp#L110 null.cpp (line 110) - DWARFS - GitHub]</ref> |
|- | |- | ||
| <code>--compression=[https://github.com/mhx/dwarfs/blob/main/src/compression/lzma.cpp lzma]</code> | | <code>--compression=[https://github.com/mhx/dwarfs/blob/main/src/compression/lzma.cpp lzma]</code> | ||
− | | liblzma compression<ref>[https://github.com/mhx/dwarfs/blob/main/src/compression/lzma.cpp#L413 lzma.cpp (line 413) - DWARFS - GitHub]</ref>, under modern Linux environments, [[XZ]]'s liblzma is used. | + | | liblzma compression<ref>[https://github.com/mhx/dwarfs/blob/main/src/compression/lzma.cpp#L413 lzma.cpp (line 413) - DWARFS - GitHub]</ref>, under modern [[Linux]] environments, [[XZ]]'s liblzma is used. |
|- | |- | ||
| <code>--compression=[https://github.com/mhx/dwarfs/blob/main/src/compression/zstd.cpp zstd]</code> | | <code>--compression=[https://github.com/mhx/dwarfs/blob/main/src/compression/zstd.cpp zstd]</code> | ||
Line 43: | Line 53: | ||
|- | |- | ||
| <code>--compression=[https://github.com/mhx/dwarfs/blob/main/src/compression/lz4.cpp lz4]</code> | | <code>--compression=[https://github.com/mhx/dwarfs/blob/main/src/compression/lz4.cpp lz4]</code> | ||
− | | lib[[LZ4]] compression,<ref>[https://github.com/mhx/dwarfs/blob/main/src/compression/lz4.cpp#L186 lz4.cpp (line 186) - DWARFS - GitHub]</ref> choice of either LZ4HC or LZ4 compression are available. | + | | lib[[LZ4|lz4]] compression,<ref>[https://github.com/mhx/dwarfs/blob/main/src/compression/lz4.cpp#L186 lz4.cpp (line 186) - DWARFS - GitHub]</ref> choice of either LZ4HC or LZ4 compression are available. |
|- | |- | ||
| <code>--compression=[https://github.com/mhx/dwarfs/blob/main/src/compression/brotli.cpp brotli]</code> | | <code>--compression=[https://github.com/mhx/dwarfs/blob/main/src/compression/brotli.cpp brotli]</code> | ||
Line 55: | Line 65: | ||
|} | |} | ||
− | == Choice of hash for duplication detection | + | === Choice of available hash for duplication detection=== |
Since 0.7.0-RC1 introduced alternatives,<ref>[https://github.com/mhx/dwarfs/issues/92#issuecomment-1295209723 Choice of hash for duplicate detection #92 post #5 - DWARFS - GitHub]</ref> as of version [https://github.com/mhx/dwarfs/commits/main/src/checksum.cpp 0.12.4], the following hash algorithms are available: | Since 0.7.0-RC1 introduced alternatives,<ref>[https://github.com/mhx/dwarfs/issues/92#issuecomment-1295209723 Choice of hash for duplicate detection #92 post #5 - DWARFS - GitHub]</ref> as of version [https://github.com/mhx/dwarfs/commits/main/src/checksum.cpp 0.12.4], the following hash algorithms are available: | ||
{| border="1" cellpadding="6" | {| border="1" cellpadding="6" | ||
Line 64: | Line 74: | ||
| Disable file deduplication checks.<ref>[https://github.com/mhx/dwarfs/issues/92#issuecomment-1295209723 Choice of hash for duplicate detection #92, post #5 - DWARFS - GitHub]</ref> | | Disable file deduplication checks.<ref>[https://github.com/mhx/dwarfs/issues/92#issuecomment-1295209723 Choice of hash for duplicate detection #92, post #5 - DWARFS - GitHub]</ref> | ||
|- | |- | ||
− | | <code>--file-hash= | + | | <code>--file-hash=blake2b512</code> |
| | | | ||
|- | |- | ||
Line 71: | Line 81: | ||
|- | |- | ||
| <code>--file-hash=md5</code> | | <code>--file-hash=md5</code> | ||
− | | | + | | "As of 2019, MD5 continues to be widely used, despite its well-documented weaknesses and deprecation by security experts."<ref>[[wikipedia:MD5#Security|Security section of MD5 - Wikipedia]]</ref> Vulnerabilities include collision attacks and preimage vulnerability. |
|- | |- | ||
| <code>--file-hash=md5-sha1</code> | | <code>--file-hash=md5-sha1</code> | ||
Line 80: | Line 90: | ||
|- | |- | ||
| <code>--file-hash=sha1</code> | | <code>--file-hash=sha1</code> | ||
− | | Formerly used hash algorithm.<ref>[https://github.com/mhx/dwarfs/issues/92#issue-1299801734 Choice of hash for duplicate detection #92, post #1 - DWARFS - GitHub]</ref> | + | | Formerly used hash algorithm.<ref>[https://github.com/mhx/dwarfs/issues/92#issue-1299801734 Choice of hash for duplicate detection #92, post #1 - DWARFS - GitHub]</ref><br />Vulnerable to collision attacks.<ref>[[wikipedia:SHA-1#Attacks|Attacks section of SHA-1 - Wikipedia]]</ref> |
|- | |- | ||
| <code>--file-hash=sha224</code> | | <code>--file-hash=sha224</code> | ||
Line 129: | Line 139: | ||
== Links == | == Links == | ||
− | * https://github.com/mhx/dwarfs | + | * [https://github.com/mhx/dwarfs DWARFS project page (GitHub)] |
== References == | == References == | ||
<references/> | <references/> |
Latest revision as of 23:44, 17 August 2025
Deduplicating Warp-speed Advanced Read-only File System (DWARFS) is a read-only filesystem that optionally supports no compression (via --compress-level=0
or --compression=null
for instance) and no deduplication (via --file-hash none
).
It is developed by Marcus Holland-Moritz and when compared to Squashfs, DWARFS also offers a choice of hashing algorithms,[2] as well as a tool for checking DWARFS image via dwarfsck
.
At maximum compression levels using LZMA, DWARFS (using --compression=lzma:level=9:extreme
) can produce smaller file size compared to SquashFS with the rough equivalent of using XZ as -comp xz
.
The software is currently available for both Unix-like (generally Linux) and Microsoft Windows, with the caveat that, "Support for the Windows operating system is currently experimental. Having worked pretty much exclusively in a Unix world for the past two decades, my experience with Windows development is rather limited and I'd expect there to definitely be bugs and rough edges in the Windows code."[3] There is even homebrew
formulae for macOS.
See also Squashfs.
Contents |
[edit] Discussion
The aim with this software project is to ultimately create a compressed, deduplicated, read-only file system. While it is possible to disable all except for read-only, disabling it defeats the purpose.
DWARFS software project also intends to compete (performance-wise) with SquashFS at file system creation, in that a given directory, folder, or path is scanned, hashed, (optionally categorized) before the contents are compressed, adding only the unique copies. SquashFS tends to add and hash files as it creates the file system, similar to how 7-Zip or ZIP files for instance are created, with the apparent ability to detect and avoid adding duplicate files during creation process that is unique to the likes of SquashFS and DWARFS, which however is considered slower, generally double-handling compared to DWARFS which does these during the scanning phase.
The developer's original motivation was with, "several hundred different versions of Perl that were taking up something around 30 gigabytes of disk space" and that there, "was unwilling to spend more than 10% of my hard drive keeping them around for when I happened to need them."[4] This may put the project in line for competing with git
, which is a Versioning Control System (VCS), that is used to record changes to a given project at a more atomic level, compressing (and using zlib) changes with each commit, however, unlike git
which introduces rather steep learning curves, contrasting to just adding several versions of the same software project into an "archive" is arguably more trivial to do. Ultimately, DWARFS, like SquashFS has many other potential use-cases, rather than whatever they were intentionally designed for, making it a somewhat popular choice in the race against other forms of archivers and software/file distribution methods, without requiring the use of bespoke compression software that happens to be ranked among the top in Matt Mahoney's data compression benchmarks.
[edit] Identification
DWARFS files begins with the hexadecimal 44 57 41 52 46 53
which translates to "DWARFS" in ASCII.[5]
[edit] Examples
Create an extremely compressed DWARFS image, without history, all root owned contents, sha3-512 hash algorithm, idle task (lowest priority) from present directory to example.dwarfs
:
mkdwarfs --input . --output=example.dwarfs --block-size-bits=26 --compression=lzma:level=9:extreme --compress-niceness=10 --schema-compression=lzma:level=9:extreme \ --metadata-compression=lzma:level=9:extreme --no-history --pack-metadata=all,force --file-hash=sha3-512 --set-owner=0 --set-group=0 --no-history-timestamp \ --no-create-timestamp --no-history-command-line
[edit] mkdwarfs
only features
The following below illustrates features that are only available upon creating the DWARFS file. Once the DWARFS file has been created, the only way to change these is to use --recompress=all
.
[edit] Choice of available compression
As of version 0.12.4, the following compressors are available:
Switch | Comments |
--compression=null
|
No compression.[6] |
--compression=lzma
|
liblzma compression[7], under modern Linux environments, XZ's liblzma is used. |
--compression=zstd
|
libzstd compression[8] |
--compression=lz4
|
liblz4 compression,[9] choice of either LZ4HC or LZ4 compression are available. |
--compression=brotli
|
Brotli compressor[10] |
--compression=flac
|
FLAC compression[11] |
--compression=ricepp
|
RICEPP compression, likely Golomb coding[12][13] |
[edit] Choice of available hash for duplication detection
Since 0.7.0-RC1 introduced alternatives,[14] as of version 0.12.4, the following hash algorithms are available:
Switch | Remarks |
--file-hash=none
|
Disable file deduplication checks.[15] |
--file-hash=blake2b512
|
|
--file-hash=blake2s256
|
|
--file-hash=md5
|
"As of 2019, MD5 continues to be widely used, despite its well-documented weaknesses and deprecation by security experts."[16] Vulnerabilities include collision attacks and preimage vulnerability. |
--file-hash=md5-sha1
|
|
--file-hash=ripemd160
|
Appeared as a checksum test in git hash de5ec99 .[17]
|
--file-hash=sha1
|
Formerly used hash algorithm.[18] Vulnerable to collision attacks.[19] |
--file-hash=sha224
|
|
--file-hash=sha256
|
|
--file-hash=sha384
|
|
--file-hash=sha512
|
|
--file-hash=sha3-224
|
|
--file-hash=sha3-256
|
|
--file-hash=sha3-384
|
|
--file-hash=sha3-512
|
|
--file-hash=sha512-224
|
|
--file-hash=sha512-256
|
|
--file-hash=shake128
|
Disabled by author in git hash afbd85e .[20]
|
--file-hash=shake256
|
Disabled by author in git hash afbd85e .[21]
|
--file-hash=sm3
|
|
--file-hash=xxh3-64
|
Added by author in git hash 7ded26d .[22]
|
--file-hash=xxh3-128
|
Current default choice when not specified.[23] |
[edit] Links
[edit] References
- ↑ Release 0.1.0 - GitHub
- ↑ Choice of hash for duplicate detection #92 - GitHub
- ↑ Windows Support section of README.md - DWARFS - GitHub
- ↑ History section - DWARFS - GitHub
- ↑ 0000449: Add magic for the DWARFS compressed file system format - bugs.astron.com
- ↑ null.cpp (line 110) - DWARFS - GitHub
- ↑ lzma.cpp (line 413) - DWARFS - GitHub
- ↑ zstd.cpp (line 179) - DWARFS - GitHub
- ↑ lz4.cpp (line 186) - DWARFS - GitHub
- ↑ brotli.cpp (line 169) - DWARFS - GitHub
- ↑ flac.cpp (line 496) - DWARFS - GitHub
- ↑ rice.hpp (lines 7-9) - compression-algorithms - GitHub
- ↑ ricepp.cpp (line 260) - DWARFS - GitHub
- ↑ Choice of hash for duplicate detection #92 post #5 - DWARFS - GitHub
- ↑ Choice of hash for duplicate detection #92, post #5 - DWARFS - GitHub
- ↑ Security section of MD5 - Wikipedia
- ↑ Commit de5ec99 - test/checksum_test.cpp (line 55) - test(checksum): add checksum tests - DWARFS - GitHub
- ↑ Choice of hash for duplicate detection #92, post #1 - DWARFS - GitHub
- ↑ Attacks section of SHA-1 - Wikipedia
- ↑ Commit afbd85e - fix(checksum): disable extended output algorithms (e.g. shake(128|256)) - DWARFS - GitHub
- ↑ Commit afbd85e - fix(checksum): disable extended output algorithms (e.g. shake(128|256)) - DWARFS - GitHub
- ↑ Commit 7ded26d - feat(checksum): add hexdigest() method - DWARFS - GitHub
- ↑ v0.7.0-RC1 - DWARFS releases by tag - GitHub