Scientific Data formats

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(Added more science data formats from sources including DataONE and UKDA)
(Added a swag of Biological formats)
Line 11: Line 11:
 
* [[hdf]] (Hierarchical Data Format, from NASA)
 
* [[hdf]] (Hierarchical Data Format, from NASA)
 
* [[NetCDF]] (Network Common Data Format)
 
* [[NetCDF]] (Network Common Data Format)
 +
* [[SDXF]] (Structured Data Exchange Format)
 
* [[XDF]] (eXtensible Data Format)
 
* [[XDF]] (eXtensible Data Format)
 
* [[XSIL]] (Extensible Scientific Interchange Language)
 
* [[XSIL]] (Extensible Scientific Interchange Language)
Line 17: Line 18:
 
* [[FITS]] (Flexible Image Transport System)
 
* [[FITS]] (Flexible Image Transport System)
 
* [[PDS/ODL]] (Planetary Data System)
 
* [[PDS/ODL]] (Planetary Data System)
 +
 +
== Biological ==
 +
 +
* [[AB1]] (Chromatogram files used by DNA sequencing instruments from Applied Biosystems)
 +
* [[ACE]] (Sequence assembly format)
 +
* [[BAM]] (Binary compressed SAM format)
 +
* [[BED]] (Browser extensible display format describing genes and other features of DNA sequences)
 +
* [[CAF]] (Common Assembly Format for sequence assembly)
 +
* [[EMBL]] (Flatfile format used by the EMBL for nucleotide and peptide sequences)
 +
* [[FASTA and FASTQ]] (File format for sequence data, FASTQ with quality).
 +
* [[GenBank]] (Flatfile format used by NCBI for nucleotide and peptide sequences)
 +
* [[GFF]] (General feature format for describing genes and other features of DNA, RNA and protein sequences)
 +
* [[GTF]] (Gene transfer format holds information about gene structure)
 +
* [[NEXUS]] (Encodes mixed information about genetic sequence data in a block structured format)
 +
* [[PDB]] (Structures of biomolecules deposited in Protein Data Bank)
 +
* [[PHD]] (Output from the basecalling software Phred)
 +
* [[SAM]] (Sequence Alignment/Map format)
 +
* [[SCF]] (Staden chromatogram files used to store data from DNA sequencing)
 +
* [[SBML]] (Systems Biology Markup Language used to store biochemical network computational models)
 +
* [[Stockholm]] (Representing multiple sequence alignments)
 +
* [[Swiss-Prot]] (Flatfile format used for protein sequences from the Swiss-Prot database)
 +
* [[VCF]] (Variant Call Format)
  
 
== Chemical ==
 
== Chemical ==
Line 51: Line 74:
 
* [[DICOM]]
 
* [[DICOM]]
  
== Oceanographic and Atmospheric ==
+
== Oceanographic, Atmospheric and Meteorological ==
  
 
* [[GRIB]] (Grid in Binary)
 
* [[GRIB]] (Grid in Binary)
 
* [[BUFR]] (Binary Universal Format Representation)
 
* [[BUFR]] (Binary Universal Format Representation)
 
* [[IOAPI]] (netCDF augmented with metadata from the I/O API)
 
* [[IOAPI]] (netCDF augmented with metadata from the I/O API)
 +
* [[PP]] (UK Met Office format for weather model data)
  
 
== Physics ==
 
== Physics ==
  
 +
* [[CGNS]] (Computational Fluid Dynamics General Notation System)
 
* [[NeXuS]] (Common data format for neutron, x-ray and muon science)
 
* [[NeXuS]] (Common data format for neutron, x-ray and muon science)
 
* [[QCDml]] (Lattice QCD gauge configuration markup language)
 
* [[QCDml]] (Lattice QCD gauge configuration markup language)

Revision as of 02:50, 3 November 2012

File Formats > Electronic File Formats > Scientific Data formats

Contents

General

  • cdf (Common Data Format)
  • hdf (Hierarchical Data Format, from NASA)
  • NetCDF (Network Common Data Format)
  • SDXF (Structured Data Exchange Format)
  • XDF (eXtensible Data Format)
  • XSIL (Extensible Scientific Interchange Language)

Astronomical and Space

  • FITS (Flexible Image Transport System)
  • PDS/ODL (Planetary Data System)

Biological

  • AB1 (Chromatogram files used by DNA sequencing instruments from Applied Biosystems)
  • ACE (Sequence assembly format)
  • BAM (Binary compressed SAM format)
  • BED (Browser extensible display format describing genes and other features of DNA sequences)
  • CAF (Common Assembly Format for sequence assembly)
  • EMBL (Flatfile format used by the EMBL for nucleotide and peptide sequences)
  • FASTA and FASTQ (File format for sequence data, FASTQ with quality).
  • GenBank (Flatfile format used by NCBI for nucleotide and peptide sequences)
  • GFF (General feature format for describing genes and other features of DNA, RNA and protein sequences)
  • GTF (Gene transfer format holds information about gene structure)
  • NEXUS (Encodes mixed information about genetic sequence data in a block structured format)
  • PDB (Structures of biomolecules deposited in Protein Data Bank)
  • PHD (Output from the basecalling software Phred)
  • SAM (Sequence Alignment/Map format)
  • SCF (Staden chromatogram files used to store data from DNA sequencing)
  • SBML (Systems Biology Markup Language used to store biochemical network computational models)
  • Stockholm (Representing multiple sequence alignments)
  • Swiss-Prot (Flatfile format used for protein sequences from the Swiss-Prot database)
  • VCF (Variant Call Format)

Chemical

Ecological

  • Darwin Core (Standard for sharing information about biological diversity)
  • EML (Ecological Metadata Language)

Geographic and Geospatial

See also Geospatial

  • DEM (Digital Elevation Model)
  • DOQ (Digital Orthophotos)
  • e00 (ESRI ArcInfo Interchange File)
  • FGDC (Content Standard for Digital Geospatial Metadata??)
  • GeoTIFF (Geospatial extensions to TIFF)
  • GML (Geography Markup Language)
  • HDFEOS, HD2, HD4 (Hierarchical Data Format-Earth Observing System)
  • KML (KML (formerly Keyhole Markup Language), Version 2.2)
  • NDF (National Landsat Archive Production System (NLAPS) Data Format)
  • SAIF (Spatial Archive and Interchange Format, Canadian)
  • SDTS (Spatial Data Transfer Standard)
  • shp and shx (ESRI Shaepfile must have components; other optional components as well, see entry)
  • SID (MrSID- Multi-resolution Seamless Image Database)
  • TAB (MapInfo dataset format, must have component)

Mathematical

Medical Imaging

Oceanographic, Atmospheric and Meteorological

  • GRIB (Grid in Binary)
  • BUFR (Binary Universal Format Representation)
  • IOAPI (netCDF augmented with metadata from the I/O API)
  • PP (UK Met Office format for weather model data)

Physics

  • CGNS (Computational Fluid Dynamics General Notation System)
  • NeXuS (Common data format for neutron, x-ray and muon science)
  • QCDml (Lattice QCD gauge configuration markup language)

Social Sciences

  • DDI (Data Documentation Initiative)
  • SAS (Statistical package)
  • SPSS (Statistical package)
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox