Parity Volume Set

From Just Solve the File Format Problem
Jump to: navigation, search
File Format
Name Parity Volume Set
Ontology
Extension(s) .par, .pxx, .par2, .pa3
Released 2001[1]

Parity Volume Set (also known as parity archive or parchive) is a file format for storing redundant data for one or more input files. These data can be used to repair the input files if they get damaged. The error correction is based on the Reed-Solomon algorithm. Three versions of the format exist: Par1, Par2 and Par3. The Par3 format is in "near-final form"[2], it is used by an old version of MultiPar tool,[3] as well as par3cmdline.[4]

Contents

 [hide

Discussion

Historically, these were multi-part archives that was distributed in Usenet (a.k.a., "network news"), but can still be used in prevention of complete data loss during transit or storage. Parchive is like RAID for files instead of a whole file system.

The technology is based on a 'Reed-Solomon Code' implementation that allows for recovery of any 'X' real data-blocks for 'X' parity data-blocks present. (Data-blocks referring to files OR much smaller virtual slices of files).[5]

Modern Par2 software can take advantage of GPU to speed up recovery file creation.[6][7]

While Par3 has yet to be finalized as of writing in 2025, the "2022-01-28 ALPHA DRAFT" specifications addresses interesting flaws that has existed since its conception:

 Major differences from Parchive 2.0 are:
 ...(redacted for brevity)
 * replace MD5 hash (It is both slow and less secure.)
 ...(redacted for brevity)
 
 Part of "support any linear code" is to fix the major bug in Parchive 2.0. Parchive 2.0 did not do Reed-Solomon encoding as it promised. There was a major mistake in the paper that Parchive 2.0 relied on. 
 The problem manifested as a bug in Parchive 1.0 and, while Parchive 2.0 reduced its occurrence, it did not fix the problem. Parchive 2.0 did not use an always invertible matrix; it essentially used a random
 matrix, which (luckily) is invertible with high probability. Parchive 3.0 fixes that bug.
 
 The other part of "support any linear code" is supporting codes beside Reed-Solomon. Reed-Solomon has excellent data protection, but is slow to compute. LDPC and sparse random matrices will speed things 
 up dramatically, with a slight increase in errors that cannot be recovered from.

[8]

Identification

A Par1 file starts with the following byte sequence:

50 41 52 00 00 00 00 00

This corresponds to the ASCII text string PAR, followed by 5 null bytes.

A Par2 file starts with the bytes:

50 41 52 32 00 50 4B 54

This corresponds to ASCII text string PAR2, followed by a null byte and the text string PKT.

Finally, a Par3 file can be identified by the following 4-byte sequence:

50 41 33 00

This corresponds to the text string PA3, followed by a null byte.

Specifications

Specification version SourceForge/Internet Archive link GitHub link
Parity Volume Set Specification v1.0 SourceForge GitHub
Parity Volume Set Specification 2.0 SourceForge GitHub
proposal for Parchive Specification 3.0 hp.vector.co.jp IA mirror GitHub

par2 Examples

Create uniformed recovery file sizes with 100% redundancy for example.dwarfs

 par2 create -u -r100 example.dwarfs

This makes it more like Par1[9]

Software

Sample files

Par1 sample files

See Search results with par extensions - Discmaster.textfiles.com for sample Par1 files.

Par1 files are usually distributed in a set, containing <name>.par and .p<num>, where <name> is the name of the file, typically to be created as a parity archive of, and <num> is an integer that starts with 01, incrementing for each and every related Par1 archive.[10]

See Also:

Par2 sample files

See Search results with par2 extensions and are likely parity archive - Discmaster.textfiles.com for samples.

These files are usually distributed in a set, containing <name>.par2 and <name>.vol<numA>+<numB>.par2, where name is the name of the file, typically to be created as a parity archive of, and <num> is an incrementing number, and is often starts with 0 for <numA>.[11]

Additionally, Par2 files bear .par2 extension, making identification easier and less ambiguous compared to Par1, which has .par extension, and can be confused with extensions that also begins with the same .par.

Links

References

  1. parchive Files - SourceForge.net
  2. Commit 4c1b780 - 2022-01-29 - par3cmdline - GitHub
  3. Par3 support? #46 - MultiPar - GitHub
  4. par3cmdline - GitHub
  5. Parchive: Parity Archive Tool - SourceForge.net
  6. GPU Acceleration via par2j64.exe??? Is it possible? How do I do it? #40 - MultiPar - GitHub
  7. Added support for GPU acceleration (CUDA) on recovery file creation. #176 - par2cmdline - GitHub
  8. Parity Volume Set Specification 3.0 (2022-01-28 ALPHA DRAFT) - GitHub
  9. Why is PAR 2.0 better than PAR 1.0? - par2cmdline - GitHub
  10. Par2 Files Explained in Plain English - Internet Archive copy
  11. Par2 Files Explained in Plain English - Internet Archive copy
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox