FASTA and FASTQ
From Just Solve the File Format Problem
				
								
				(Difference between revisions)
				
																
				
				
								
				Dan Tobias  (Talk | contribs)  (Created page with "{{FormatInfo |subcat=Scientific Data formats |extensions={{ext|fasta}}, {{ext|fas}}, {{ext|fa}}, {{ext|seq}}, {{ext|fsa}}, {{ext|fna}}, {{ext|ffn}}, {{ext|faa}}, {{ext|mpfa}},...")  | 
			|||
| (3 intermediate revisions by 2 users not shown) | |||
| Line 2: | Line 2: | ||
|subcat=Scientific Data formats  | |subcat=Scientific Data formats  | ||
|extensions={{ext|fasta}}, {{ext|fas}}, {{ext|fa}}, {{ext|seq}}, {{ext|fsa}}, {{ext|fna}}, {{ext|ffn}}, {{ext|faa}}, {{ext|mpfa}}, {{ext|frn}}, {{ext|fastq}}  | |extensions={{ext|fasta}}, {{ext|fas}}, {{ext|fa}}, {{ext|seq}}, {{ext|fsa}}, {{ext|fna}}, {{ext|ffn}}, {{ext|faa}}, {{ext|mpfa}}, {{ext|frn}}, {{ext|fastq}}  | ||
| + | |wikidata={{wikidata|Q1593782}}, {{wikidata|Q3063023}}  | ||
}}  | }}  | ||
'''FASTA and FASTQ''' are text-based formats for representing nucleotide ([[DNA]] or [[RNA]]) or peptide sequences, used in biology. The FASTA format is a simple representation of the elements of the sequences using letters (the standard C, G, T, and A for DNA nucleotides, and other letters for special uses, as well as a set of letters for peptides in amino acids), while FASTQ also encodes quality scores for the data.  | '''FASTA and FASTQ''' are text-based formats for representing nucleotide ([[DNA]] or [[RNA]]) or peptide sequences, used in biology. The FASTA format is a simple representation of the elements of the sequences using letters (the standard C, G, T, and A for DNA nucleotides, and other letters for special uses, as well as a set of letters for peptides in amino acids), while FASTQ also encodes quality scores for the data.  | ||
| Line 14: | Line 15: | ||
* .mpfa: FASTA amino acides in multiple proteins  | * .mpfa: FASTA amino acides in multiple proteins  | ||
* .frn: FASTA non-coding RNA  | * .frn: FASTA non-coding RNA  | ||
| − | * .fastq: FASTQ  | + | * .fastq, .fq: FASTQ  | 
Files may also be distributed in compressed forms, adding second extensions such as .fastq.gz.  | Files may also be distributed in compressed forms, adding second extensions such as .fastq.gz.  | ||
| + | |||
| + | == Samples ==  | ||
| + | * [https://www.ncbi.nlm.nih.gov/datasets/taxonomy/9606/ Human Genome in FASTA and other formats]  | ||
| + | * [https://github.com/hartwigmedical/testdata FASTQ File Samples]  | ||
== Links ==  | == Links ==  | ||
| Line 26: | Line 31: | ||
* [http://qiime.org/scripts/extract_barcodes.html Format FASTQ sequences and barcode data]  | * [http://qiime.org/scripts/extract_barcodes.html Format FASTQ sequences and barcode data]  | ||
* [http://www.bioinformaticsbox.com/tools/sequence_format_converter.php FASTA format converter]  | * [http://www.bioinformaticsbox.com/tools/sequence_format_converter.php FASTA format converter]  | ||
| + | * [http://search.cpan.org/dist/BioPerl-1.6.901/Bio/SeqIO.pm Bio::SeqIO (Perl)] ([http://www.bioperl.org/wiki/Module:Bio::SeqIO more info]) (connected with [http://search.cpan.org/~cjfields/BioPerl-1.6.924/Bio/Seq.pm Bio::Seq])  | ||
| + | * [http://biopython.org/wiki/SeqIO SeqIO (Python)] (connected with [http://biopython.org/wiki/Seq Seq]) ([http://biopython.org/DIST/docs/api/Bio.Seq.Seq-class.html more info])  | ||
Latest revision as of 00:17, 11 June 2024
FASTA and FASTQ are text-based formats for representing nucleotide (DNA or RNA) or peptide sequences, used in biology. The FASTA format is a simple representation of the elements of the sequences using letters (the standard C, G, T, and A for DNA nucleotides, and other letters for special uses, as well as a set of letters for peptides in amino acids), while FASTQ also encodes quality scores for the data.
[edit] File extensions
A number of extensions are used, and they are not always completely standardized.
- .fasta, .fas, .fa, .seq, .fsa: Generic FASTA
 - .fna: FASTA nucleic acids
 - .ffn: FASTA nucleotide coding regions for a genome
 - .faa FASTA amino acids
 - .mpfa: FASTA amino acides in multiple proteins
 - .frn: FASTA non-coding RNA
 - .fastq, .fq: FASTQ
 
Files may also be distributed in compressed forms, adding second extensions such as .fastq.gz.
[edit] Samples
[edit] Links
Categories: 
- File Formats
 - Electronic File Formats
 - Scientific Data formats
 - File formats with extension .fasta
 - File formats with extension .fas
 - File formats with extension .fa
 - File formats with extension .seq
 - File formats with extension .fsa
 - File formats with extension .fna
 - File formats with extension .ffn
 - File formats with extension .faa
 - File formats with extension .mpfa
 - File formats with extension .frn
 - File formats with extension .fastq