Scientific Data formats
From Just Solve the File Format Problem
(Difference between revisions)
Crusbridge (Talk | contribs) (Added more science data formats from sources including DataONE and UKDA) |
Crusbridge (Talk | contribs) (Added a swag of Biological formats) |
||
Line 11: | Line 11: | ||
* [[hdf]] (Hierarchical Data Format, from NASA) | * [[hdf]] (Hierarchical Data Format, from NASA) | ||
* [[NetCDF]] (Network Common Data Format) | * [[NetCDF]] (Network Common Data Format) | ||
+ | * [[SDXF]] (Structured Data Exchange Format) | ||
* [[XDF]] (eXtensible Data Format) | * [[XDF]] (eXtensible Data Format) | ||
* [[XSIL]] (Extensible Scientific Interchange Language) | * [[XSIL]] (Extensible Scientific Interchange Language) | ||
Line 17: | Line 18: | ||
* [[FITS]] (Flexible Image Transport System) | * [[FITS]] (Flexible Image Transport System) | ||
* [[PDS/ODL]] (Planetary Data System) | * [[PDS/ODL]] (Planetary Data System) | ||
+ | |||
+ | == Biological == | ||
+ | |||
+ | * [[AB1]] (Chromatogram files used by DNA sequencing instruments from Applied Biosystems) | ||
+ | * [[ACE]] (Sequence assembly format) | ||
+ | * [[BAM]] (Binary compressed SAM format) | ||
+ | * [[BED]] (Browser extensible display format describing genes and other features of DNA sequences) | ||
+ | * [[CAF]] (Common Assembly Format for sequence assembly) | ||
+ | * [[EMBL]] (Flatfile format used by the EMBL for nucleotide and peptide sequences) | ||
+ | * [[FASTA and FASTQ]] (File format for sequence data, FASTQ with quality). | ||
+ | * [[GenBank]] (Flatfile format used by NCBI for nucleotide and peptide sequences) | ||
+ | * [[GFF]] (General feature format for describing genes and other features of DNA, RNA and protein sequences) | ||
+ | * [[GTF]] (Gene transfer format holds information about gene structure) | ||
+ | * [[NEXUS]] (Encodes mixed information about genetic sequence data in a block structured format) | ||
+ | * [[PDB]] (Structures of biomolecules deposited in Protein Data Bank) | ||
+ | * [[PHD]] (Output from the basecalling software Phred) | ||
+ | * [[SAM]] (Sequence Alignment/Map format) | ||
+ | * [[SCF]] (Staden chromatogram files used to store data from DNA sequencing) | ||
+ | * [[SBML]] (Systems Biology Markup Language used to store biochemical network computational models) | ||
+ | * [[Stockholm]] (Representing multiple sequence alignments) | ||
+ | * [[Swiss-Prot]] (Flatfile format used for protein sequences from the Swiss-Prot database) | ||
+ | * [[VCF]] (Variant Call Format) | ||
== Chemical == | == Chemical == | ||
Line 51: | Line 74: | ||
* [[DICOM]] | * [[DICOM]] | ||
− | == Oceanographic | + | == Oceanographic, Atmospheric and Meteorological == |
* [[GRIB]] (Grid in Binary) | * [[GRIB]] (Grid in Binary) | ||
* [[BUFR]] (Binary Universal Format Representation) | * [[BUFR]] (Binary Universal Format Representation) | ||
* [[IOAPI]] (netCDF augmented with metadata from the I/O API) | * [[IOAPI]] (netCDF augmented with metadata from the I/O API) | ||
+ | * [[PP]] (UK Met Office format for weather model data) | ||
== Physics == | == Physics == | ||
+ | * [[CGNS]] (Computational Fluid Dynamics General Notation System) | ||
* [[NeXuS]] (Common data format for neutron, x-ray and muon science) | * [[NeXuS]] (Common data format for neutron, x-ray and muon science) | ||
* [[QCDml]] (Lattice QCD gauge configuration markup language) | * [[QCDml]] (Lattice QCD gauge configuration markup language) |
Revision as of 02:50, 3 November 2012
File Formats | > | Electronic File Formats | > | Scientific Data formats |
Contents |
General
- cdf (Common Data Format)
- hdf (Hierarchical Data Format, from NASA)
- NetCDF (Network Common Data Format)
- SDXF (Structured Data Exchange Format)
- XDF (eXtensible Data Format)
- XSIL (Extensible Scientific Interchange Language)
Astronomical and Space
Biological
- AB1 (Chromatogram files used by DNA sequencing instruments from Applied Biosystems)
- ACE (Sequence assembly format)
- BAM (Binary compressed SAM format)
- BED (Browser extensible display format describing genes and other features of DNA sequences)
- CAF (Common Assembly Format for sequence assembly)
- EMBL (Flatfile format used by the EMBL for nucleotide and peptide sequences)
- FASTA and FASTQ (File format for sequence data, FASTQ with quality).
- GenBank (Flatfile format used by NCBI for nucleotide and peptide sequences)
- GFF (General feature format for describing genes and other features of DNA, RNA and protein sequences)
- GTF (Gene transfer format holds information about gene structure)
- NEXUS (Encodes mixed information about genetic sequence data in a block structured format)
- PDB (Structures of biomolecules deposited in Protein Data Bank)
- PHD (Output from the basecalling software Phred)
- SAM (Sequence Alignment/Map format)
- SCF (Staden chromatogram files used to store data from DNA sequencing)
- SBML (Systems Biology Markup Language used to store biochemical network computational models)
- Stockholm (Representing multiple sequence alignments)
- Swiss-Prot (Flatfile format used for protein sequences from the Swiss-Prot database)
- VCF (Variant Call Format)
Chemical
Ecological
- Darwin Core (Standard for sharing information about biological diversity)
- EML (Ecological Metadata Language)
Geographic and Geospatial
See also Geospatial
- DEM (Digital Elevation Model)
- DOQ (Digital Orthophotos)
- e00 (ESRI ArcInfo Interchange File)
- FGDC (Content Standard for Digital Geospatial Metadata??)
- GeoTIFF (Geospatial extensions to TIFF)
- GML (Geography Markup Language)
- HDFEOS, HD2, HD4 (Hierarchical Data Format-Earth Observing System)
- KML (KML (formerly Keyhole Markup Language), Version 2.2)
- NDF (National Landsat Archive Production System (NLAPS) Data Format)
- SAIF (Spatial Archive and Interchange Format, Canadian)
- SDTS (Spatial Data Transfer Standard)
- shp and shx (ESRI Shaepfile must have components; other optional components as well, see entry)
- SID (MrSID- Multi-resolution Seamless Image Database)
- TAB (MapInfo dataset format, must have component)
Mathematical
Medical Imaging
Oceanographic, Atmospheric and Meteorological
- GRIB (Grid in Binary)
- BUFR (Binary Universal Format Representation)
- IOAPI (netCDF augmented with metadata from the I/O API)
- PP (UK Met Office format for weather model data)
Physics
- CGNS (Computational Fluid Dynamics General Notation System)
- NeXuS (Common data format for neutron, x-ray and muon science)
- QCDml (Lattice QCD gauge configuration markup language)