DWARFS
(Added more details in FormatInfo, added Identification section as well as Discussion. Added the possibility to not use compression and/or deduplication.) |
|||
Line 3: | Line 3: | ||
|formattype=electronic | |formattype=electronic | ||
|subcat=Filesystem | |subcat=Filesystem | ||
+ | |compression=lossless, optional | ||
+ | |magic=44 57 41 52 46 53 | ||
|released=2020 <ref>[https://github.com/mhx/dwarfs/releases/tag/release-0.1.0 Release 0.1.0 - GitHub]</ref> | |released=2020 <ref>[https://github.com/mhx/dwarfs/releases/tag/release-0.1.0 Release 0.1.0 - GitHub]</ref> | ||
}} | }} | ||
− | '''D'''eduplicating '''W'''arp-speed '''A'''dvanced '''R'''ead-only '''F'''ile '''S'''ystem (DWARFS) is a | + | '''D'''eduplicating '''W'''arp-speed '''A'''dvanced '''R'''ead-only '''F'''ile '''S'''ystem (DWARFS) is a read-only filesystem that optionally supports no compression (via <code>--compress-level=0</code> or <code>--compression=null</code> for instance) and no deduplication (via <code>--file-hash none</code>. It is developed by Marcus Holland-Moritz and when compared to Squashfs, DWARFS also offers a choice of hashing algorithms,<ref>[https://github.com/mhx/dwarfs/issues/92 Choice of hash for duplicate detection #92 - GitHub]</ref> as well as a tool for checking DWARFS image via <code>dwarfsck</code>. At maximum compression levels using LZMA, DWARFS (using <code>--compression=lzma:level=9:extreme</code>) can produce smaller file size compared to SquashFS with the rough equivalent of using XZ as <code>-comp xz</code>. |
See also [[Squashfs]]. | See also [[Squashfs]]. | ||
+ | |||
+ | == Discussion == | ||
+ | The aim with this software project is to ultimately create a compressed, deduplicated, read-only file system. While it is possible to disable all except for read-only, disabling it defeats the purpose. | ||
+ | |||
+ | DWARFS software project also intends to compete (performance-wise) with [[SquashFS]] at file system creation, in that a given directory, folder, or path is scanned, hashed, (optionally categorized) before the contents are compressed, adding only the unique copies. [[SquashFS]] tends to add and hash files as it creates the file system, similar to how [[7-Zip]] or [[ZIP]] files for instance are created, with the apparent ability to detect and avoid adding duplicate files during creation process that is unique to the likes of [[SquashFS]] and DWARFS, which however is considered slower, generally double-handling compared to DWARFS which does these during the scanning phase. | ||
+ | |||
+ | The developer's original motivation was with, "several hundred different versions of Perl that were taking up something around 30 gigabytes of disk space" and that there, "was unwilling to spend more than 10% of my hard drive keeping them around for when I happened to need them."<ref>[https://github.com/mhx/dwarfs?tab=readme-ov-file#history History section - DWARFS - GitHub]</ref> This may put the project in line for competing with <code>git</code>, which is a Versioning Control System (VCS), that is used to record changes to a given project at a more atomic level, compressing (and using zlib) changes with each commit, however, unlike <code>git</code> which introduces rather steep learning curves, contrasting to just adding several versions of the same software project into an "archive" is arguably more trivial to do. Ultimately, DWARFS, like [[SquashFS]] has many other potential use-cases, rather than whatever they were intentionally designed for, making it a somewhat popular choice in the race against other forms of archivers and software/file distribution methods, without requiring the use of bespoke compression software that happens to be ranked among the top in [https://www.mattmahoney.net/dc/text.html Matt Mahoney's data compression benchmarks]. | ||
+ | |||
+ | == Identification == | ||
+ | DWARFS files begins with the hexadecimal {{magic|44 57 41 52 46 53}} which translates to "DWARFS" in ASCII.<ref>[https://bugs.astron.com/view.php?id=449 0000449: Add magic for the DWARFS compressed file system format - bugs.astron.com]</ref> | ||
== Examples == | == Examples == |
Revision as of 09:31, 16 August 2025
Deduplicating Warp-speed Advanced Read-only File System (DWARFS) is a read-only filesystem that optionally supports no compression (via --compress-level=0
or --compression=null
for instance) and no deduplication (via --file-hash none
. It is developed by Marcus Holland-Moritz and when compared to Squashfs, DWARFS also offers a choice of hashing algorithms,[2] as well as a tool for checking DWARFS image via dwarfsck
. At maximum compression levels using LZMA, DWARFS (using --compression=lzma:level=9:extreme
) can produce smaller file size compared to SquashFS with the rough equivalent of using XZ as -comp xz
.
See also Squashfs.
Contents |
Discussion
The aim with this software project is to ultimately create a compressed, deduplicated, read-only file system. While it is possible to disable all except for read-only, disabling it defeats the purpose.
DWARFS software project also intends to compete (performance-wise) with SquashFS at file system creation, in that a given directory, folder, or path is scanned, hashed, (optionally categorized) before the contents are compressed, adding only the unique copies. SquashFS tends to add and hash files as it creates the file system, similar to how 7-Zip or ZIP files for instance are created, with the apparent ability to detect and avoid adding duplicate files during creation process that is unique to the likes of SquashFS and DWARFS, which however is considered slower, generally double-handling compared to DWARFS which does these during the scanning phase.
The developer's original motivation was with, "several hundred different versions of Perl that were taking up something around 30 gigabytes of disk space" and that there, "was unwilling to spend more than 10% of my hard drive keeping them around for when I happened to need them."[3] This may put the project in line for competing with git
, which is a Versioning Control System (VCS), that is used to record changes to a given project at a more atomic level, compressing (and using zlib) changes with each commit, however, unlike git
which introduces rather steep learning curves, contrasting to just adding several versions of the same software project into an "archive" is arguably more trivial to do. Ultimately, DWARFS, like SquashFS has many other potential use-cases, rather than whatever they were intentionally designed for, making it a somewhat popular choice in the race against other forms of archivers and software/file distribution methods, without requiring the use of bespoke compression software that happens to be ranked among the top in Matt Mahoney's data compression benchmarks.
Identification
DWARFS files begins with the hexadecimal 44 57 41 52 46 53
which translates to "DWARFS" in ASCII.[4]
Examples
Create an extremely compressed DwarFS image, without history, all root owned contents, sha3-512 hash algorithm, idle task (lowest priority) from present directory to example.dwarfs
:
mkdwarfs --input . --output=example.dwarfs --block-size-bits=26 --compression=lzma:level=9:extreme --compress-niceness=10 --schema-compression=lzma:level=9:extreme \ --metadata-compression=lzma:level=9:extreme --no-history --pack-metadata=all,force --file-hash=sha3-512 --set-owner=0 --set-group=0 --no-history-timestamp \ --no-create-timestamp --no-history-command-line