Scientific Data formats

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(General)
(Added rest of BioSharing formats)
Line 56: Line 56:
 
* [[Clustered Data Table Format]]
 
* [[Clustered Data Table Format]]
 
* [[DELTA]] (DEscription Language for TAxonomy)
 
* [[DELTA]] (DEscription Language for TAxonomy)
 +
* [[DAS]] (Distributed Sequence Annotation System)
 +
* [[DBN]] (Dot Bracket Notation (DBN) - Vienna Format)
 
* [[EMBL]] (Flatfile format used by the EMBL for nucleotide and peptide sequences)
 
* [[EMBL]] (Flatfile format used by the EMBL for nucleotide and peptide sequences)
* [[FASTA and FASTQ]] (File format for sequence data, FASTQ with quality).
+
* [[ENCODE]] (Peak information Format)
* [[GelML]]
+
* [[FASTA and FASTQ]] (File format for sequence data, FASTQ with quality)
 +
* [[FuGEFlow]]
 +
* [[FuGE-ML]] (Functional Genomics Experiment Markup Language)
 +
* [[Gating-ML]]
 +
* [[GCDML]] (Genomic Contextual Data Markup Language)
 +
* [[GelML]] Gel electrophoresis Markup Language
 
* [[GenBank]] (Flatfile format used by NCBI for nucleotide and peptide sequences)
 
* [[GenBank]] (Flatfile format used by NCBI for nucleotide and peptide sequences)
 +
* [[Gene Feature File]] (Versions 1 and 3)
 
* [[GFF]] (General feature format for describing genes and other features of DNA, RNA and protein sequences)
 
* [[GFF]] (General feature format for describing genes and other features of DNA, RNA and protein sequences)
 +
* [[Gene Prediction File Format]]
 +
* [[GenePattern GeneSet Table Format]]
 +
* [[Genome Annotation File]] (version 1 and 2)
 
* [[GTF]] (Gene transfer format holds information about gene structure)
 
* [[GTF]] (Gene transfer format holds information about gene structure)
 +
* [[HMMER]]
 +
* [[ICB]] (ICM binary file Format)
 +
* [[imzML]] (imaging mz Markup Language)
 +
* [[ISA-Tab]] (Investigation Study Assay Tabular)
 +
* [[ISND sequence record XML]]
 +
* [[KGML]] (KEGG Mark-up Language)
 +
* [[MAGE-Tab]] (MicroArray Gene Expression Tabular)
 +
* [[MCL]] (Microbiological Common Language)
 +
* [[MIARE-TAB]] (Minimum Information About a RNAi Experiment Tabular)
 +
* [[microarray track data Browser Extensible Data Format]]
 +
* [[MINiML]] (MIAME Notation in Markup Language)
 +
* [[mini Protein Data Bank Format]]
 +
* [[MIQAS-TAB]] (Minimal Information for QTLs and Association Studies Tabular)
 
* [[MITAB]]
 
* [[MITAB]]
 +
* [[mmCIF]] (macromolecular Crystallographic Information File)
 +
* [[Multiple Alignment Forma]]
 
* [[mzData]] (deprecated)
 
* [[mzData]] (deprecated)
 
* [[mzIdentML]]
 
* [[mzIdentML]]
 
* [[mzML]]
 
* [[mzML]]
 
* [[mzQuantML]]
 
* [[mzQuantML]]
 +
* [[NCD]] (Natural Collections Descriptions)
 +
* [[NDTF]] (Neurophysiology Data Translation Format)
 +
* [[net alignment annotation Format]]
 +
* [[NeuroML]] (Neuroscience eXtensible Markup Language)
 +
* [[New Hampshire eXtended Format]]
 +
* [[Newick tree Format]]
 
* [[NEXUS]] (Encodes mixed information about genetic sequence data in a block structured format)
 
* [[NEXUS]] (Encodes mixed information about genetic sequence data in a block structured format)
 +
* [[Nimblegen Design File Format]]
 +
* [[Nimblegen Gene Data Format]]
 +
* [[NMR-STAR]] (NMR Self-defining Text Archive and Retrieval format)
 +
* [[nucleotide inFormation binary Format]]
 +
* [[ODM]] (Operational Data Model)
 +
* [[Open Biomedical Ontology Flat File Format]]
 
* [[PDB]] (Structures of biomolecules deposited in Protein Data Bank)
 
* [[PDB]] (Structures of biomolecules deposited in Protein Data Bank)
 +
* [[Personal Genome SNP Format]]
 
* [[PHD]] (Output from the basecalling software Phred)
 
* [[PHD]] (Output from the basecalling software Phred)
 +
* [[phyloXML]] (XML for evolutionary biology and comparative genomics)
 +
* [[Pre-Clustering File Format]]
 +
* [[Protein InFormation Resource Format]]
 +
* [[PRM]] (Protocol Representation Model (Medical Research))
 
* [[PSI-MI XML]]
 
* [[PSI-MI XML]]
 
* [[PSI-PAR]]
 
* [[PSI-PAR]]
 +
* [[RDML]] (Real-time PCR Data Markup Language)
 
* [[SAM]] (Sequence Alignment/Map format)
 
* [[SAM]] (Sequence Alignment/Map format)
 
* [[SCF]] (Staden chromatogram files used to store data from DNA sequencing)
 
* [[SCF]] (Staden chromatogram files used to store data from DNA sequencing)
 
* [[SBML]] (Systems Biology Markup Language used to store biochemical network computational models)
 
* [[SBML]] (Systems Biology Markup Language used to store biochemical network computational models)
* [[spML]]
+
* [[SDD]] (Structured Descriptive Data)
* [[Stockholm]] (Representing multiple sequence alignments)
+
* [[SED-ML]] (Simulation Experiment Description Markup Language)
 +
* [[Sequence Alignment Map Format]]
 +
* [[SOFT]] (Simple Omnibus Format in Text)
 +
* [[spML]] (Separation Markup Language)
 +
* [[SRA-XML]] (Short Read Archive eXtensible Markup Language)
 +
* [[Standard Flowgram Format]]
 +
* [[Stockholm Multiple Alignment Format]] (Representing multiple sequence alignments)
 +
* [[SBML]] (System Biology Markup Language)
 +
* [[SBGN]] (Systems Biology Graphical Notation)
 +
* [[SBRML]] (Systems Biology Results Markup Language)
 
* [[Swiss-Prot]] (Flatfile format used for protein sequences from the Swiss-Prot database)
 
* [[Swiss-Prot]] (Flatfile format used for protein sequences from the Swiss-Prot database)
* [[TraML]]
+
* [[TAIR annotation data Format]]
 +
* [[TAPIR]] (TDWG Access Protocol for Information Retrieval)
 +
* [[TCS]] (Taxonomic Concept transfer Schema)
 +
* [[TraML]] (Transition Markup Language)
 +
* [[UniProtKB XML Format]]
 
* [[VCF]] (Variant Call Format)
 
* [[VCF]] (Variant Call Format)
 +
* [[Wiggle Format]]
  
 
== Biomedical signals (time series) ==
 
== Biomedical signals (time series) ==

Revision as of 00:26, 8 November 2012

File Formats > Electronic File Formats > Scientific Data formats

Contents

General

  • cdf (Common Data Format)
  • EAS3 (binary file format for structured data)
  • hdf (Hierarchical Data Format, from NASA)
  • NetCDF (Network Common Data Format)
  • There are several formats abbreviated as SDF, including:
    • Simple Data format (SDF) By George H. Fisher, Space Sciences Lab, UC Berkeley (A platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays)
    • Simple Data format-DPT A new format from the Data Protocols Team for publishing and sharing data
    • Standard Delay Format A standard data structure for timing data
    • Structure Data File A file format for a chemical table file
  • SDXF (Structured Data Exchange Format)
  • Silo (a storage format for visualization developed at Lawrence Livermore National Laboratory)* XDF (eXtensible Data Format)
  • XSIL (Extensible Scientific Interchange Language)

Astronomical and Space

  • FITS (Flexible Image Transport System)
  • PDS/ODL (Planetary Data System)
  • VOTable (IVOA standard table format)
  • SDF (Starlink Data Format) and NDF (Starlink's Extensible N-Dimensional Data Format).

Biological

Biomedical signals (time series)

  • ACQ (AcqKnowledge)
  • BCI2000 (The BCI2000 project)
  • BioSemi (BDF) data format
  • BKR (EEG data format)
  • CFWB (Chart Data File Format)
  • DICOM-Waveform (An extension of Dicom for storing waveform data)
  • ecgML (A markup language for electrocardiogram data acquisition and analysis)
  • EDF/EDF+ (European Data Format)
  • FEF (File Exchange Format for Vital signs, CEN TS 14271)
  • GDF v1.x (General Data Format for biomedical signals - Version 1.x)
  • GDF v2.x (The General Data Format for biomedical signals - Version 2.x)
  • HL7aECG (Health Level 7 v3 annotated ECG)
  • OpenXDF (Open Exchange Data Format)
  • SCP-ECG (Standard Communication Protocol for Computer assisted electrocardiography)
  • SIGIF (A digital SIGnal Interchange Format)
  • WFDB (Format of Physiobank)

Chemical

  • CCP4 (X-ray crystallography voxels (electron density))
  • CHM (ChemDraw file format)
  • CIF (Crystallographic Information File, standardised by IUCr)
  • CML (Chemical markup language)
  • CTab (Chemical table file .mol, .sd, .sdf)
  • HITRAN (spectroscopic data with one optical/infrared transition per line in the ASCII file (.hit))
  • JCAMP (Joint Committee on Atomic and Molecular Physical Data, .dx, .jdx)
  • MOL (MDL Molfile)
  • MOP (MOPAC format)
  • MRC (voxels in cryo-electron microscopy)
  • PDB (Protein Data Bank)
  • SMILES (Simplified molecular input line entry specification, .smi)
  • SPC (spectroscopic data)
  • Structure Data File (SDF)

Chemical data may be distinguished in various ways, including Chemical MIME types.

Ecological

  • Darwin Core (Standard for sharing information about biological diversity)
  • EML (Ecological Metadata Language)

Geographic and Geospatial

See also Geospatial

  • DEM (Digital Elevation Model)
  • DOQ (Digital Orthophotos)
  • e00 (ESRI ArcInfo Interchange File)
  • FGDC (Content Standard for Digital Geospatial Metadata??)
  • GeoTIFF (Geospatial extensions to TIFF)
  • GML (Geography Markup Language)
  • HDFEOS, HD2, HD4 (Hierarchical Data Format-Earth Observing System)
  • KML (KML (formerly Keyhole Markup Language), Version 2.2)
  • NDF (National Landsat Archive Production System (NLAPS) Data Format)
  • SAIF (Spatial Archive and Interchange Format, Canadian)
  • SDTS (Spatial Data Transfer Standard)
  • shp and shx (ESRI Shaepfile must have components; other optional components as well, see entry)
  • SID (MrSID- Multi-resolution Seamless Image Database)
  • TAB (MapInfo dataset format, must have component)

Mathematical

  • graph6, sparse6 (ASCII encoding of Adjacency matrices (.g6, .s6))
  • M (Mathematica package file)
  • MAT (MATLAB matrix data format)
  • MathML
  • WP2 WinPlot

Medical Imaging

  • AFNI (data, meta-data (.BRIK,.HEAD))
  • MGH (uncompressed)
  • MGZ (zip-compressed)
  • Analyze data, meta-data (.img,.hdr)
  • DICOM (Digital Imaging and Communications in Medicine (.dcm))
  • MINC (Medical Imaging NetCDF format; since version 2.0, based on HDF5 (.mnc))
  • OME-TIFF (Open Microscopy Imaging format)
  • OME-XML (Open Microscopy Imaging format)
  • OST (Open Spatio-Temporal) (extensible, open alternative for microscope images)
  • nii (Neuroimaging Informatics Technology Initiative (NIfTI) single-file (combined data and meta-data))
  • gii (NIfTI offspring for brain surface data, single-file (combined data and meta-data) style)
  • .img,.hdr (NIfTI offspring for brain surface data, dual-file (separate data and meta-data, respectively) style)
  • SDM (Signed Differential Mapping- brain maps(.sdm))

Oceanographic, Atmospheric and Meteorological

  • GRIB (Grid in Binary)
  • BUFR (Binary Universal Format Representation)
  • IOAPI (netCDF augmented with metadata from the I/O API)
  • PP (UK Met Office format for weather model data)

Physics

  • CGNS (Computational Fluid Dynamics General Notation System)
  • NeXuS (Common data format for neutron, x-ray and muon science)
  • QCDml (Lattice QCD gauge configuration markup language)

Scientific Signal data

  • ACQ (AcqKnowledge File Format for Windows)
  • BioSemi (BDF) data format
  • BKR (EEG data format)
  • CFWB (Chart Data File Format)
  • EDF (European data format)
  • FEF (File Exchange Format for Vital signs)
  • GDF (General data formats for biomedical signals)
  • GMS (Gesture And Motion Signal format)
  • IROCK (intelliRock Sensor Data File Format)
  • MFER (Medical waveform Format Encoding Rules)
  • REC (ATI Vision recorder file)
  • SCP-ECG (Standard Communication Protocol for Computer assisted electrocardiography)
  • SEG Y (Reflection seismology data format)
  • SIGIF (SIGnal Interchange Format)

Social Sciences

  • DDI (Data Documentation Initiative)
  • SAS (Statistical package)
  • SPSS (Statistical package)
  • Stata (Statistical package)
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox