Scientific Data formats
From Just Solve the File Format Problem
Revision as of 13:01, 22 January 2013 by Dan Tobias (Talk | contribs)
Contents |
General
- cdf (Common Data Format)
- EAS3 (binary file format for structured data)
- hdf (Hierarchical Data Format, originally from NCSA, now maintained by The HDF Group)
- NRRD (Nearly Raw Raster Data -- a simple format for n-dimensional raster data)
- NetCDF (Network Common Data Format)
- There are several formats abbreviated as SDF, including:
- Simple Data format (SDF) By George H. Fisher, Space Sciences Lab, UC Berkeley (A platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays)
- Simple Data format-DPT A new format from the Data Protocols Team for publishing and sharing data
- Standard Delay Format A standard data structure for timing data
- Structure Data File A file format for a chemical table file
- SDXF (Structured Data Exchange Format)
- Silo (a storage format for visualization developed at Lawrence Livermore National Laboratory)
- XDF (eXtensible Data Format)
- XSIL (Extensible Scientific Interchange Language)
Astronomical and Space
- Flexible Image Transport System (FITS)
- PDS/ODL (Planetary Data System)
- VOTable (IVOA standard table format)
- SDF (Starlink Data Format) and NDF (Starlink's Extensible N-Dimensional Data Format).
Biological
- AB1 (Chromatogram files used by DNA sequencing instruments from Applied Biosystems)
- ABCD (Access to Biological Collection Data)
- ABCDDNA (Access to Biological Collection Data DNA extension)
- ABCDEFG (Access to Biological Collection Data Extension For Geosciences)
- ACE (Sequence assembly format)
- Affymetrix Raw Intensity Format
- ARLEQUIN Project Format
- Axt Alignment Format
- BAM (Binary compressed SAM format)
- BED (Browser extensible display format describing genes and other features of DNA sequences)
- BEDgraph
- Big Browser Extensible Data Format
- Big Wiggle Format
- Binary Alignement Map Format
- Binary Probe Map Format
- Binary sequence information Format
- Biological Pathway eXchange
- BLAT alignment Format
- BRIX generated O Format
- CAF (Common Assembly Format for sequence assembly)
- CellML
- CHADO XML interchange Format
- Chain Format for pairwise alignment
- CHARMM Card File Format
- CLUSTAL-W Alignment Format
- CLUSTAL-W Dendrogram Guide File Format
- Clustered Data Table Format
- DELTA (DEscription Language for TAxonomy)
- DAS (Distributed Sequence Annotation System)
- DBN (Dot Bracket Notation (DBN) - Vienna Format)
- EMBL (Flatfile format used by the EMBL for nucleotide and peptide sequences)
- EML (Environmental Markup Language) not to be confused with EML (Ecological Metadata Language)
- ENCODE (Peak information Format)
- FASTA and FASTQ (File format for sequence data, FASTQ with quality)
- FuGEFlow
- FuGE-ML (Functional Genomics Experiment Markup Language)
- Gating-ML
- GCDML (Genomic Contextual Data Markup Language)
- GelML Gel electrophoresis Markup Language
- GenBank (Flatfile format used by NCBI for nucleotide and peptide sequences)
- Gene Feature File (Versions 1 and 3)
- GFF (General feature format for describing genes and other features of DNA, RNA and protein sequences)
- Gene Prediction File Format
- GenePattern GeneSet Table Format
- Genome Annotation File (version 1 and 2)
- GTF (Gene transfer format holds information about gene structure)
- HMMER
- ICB (ICM binary file Format)
- imzML (imaging mz Markup Language)
- ISA-Tab (Investigation Study Assay Tabular)
- ISND sequence record XML
- KGML (KEGG Mark-up Language)
- MAGE-Tab (MicroArray Gene Expression Tabular)
- MCL (Microbiological Common Language)
- MIARE-TAB (Minimum Information About a RNAi Experiment Tabular)
- microarray track data Browser Extensible Data Format
- MINiML (MIAME Notation in Markup Language)
- mini Protein Data Bank Format
- MIQAS-TAB (Minimal Information for QTLs and Association Studies Tabular)
- MITAB
- mmCIF (macromolecular Crystallographic Information File)
- Multiple Alignment Forma
- mzData (deprecated)
- mzIdentML
- mzML
- mzQuantML
- mzXML (deprecated)
- NCD (Natural Collections Descriptions)
- NDTF (Neurophysiology Data Translation Format)
- net alignment annotation Format
- NeuroML (Neuroscience eXtensible Markup Language)
- New Hampshire eXtended Format
- Newick tree Format
- NEXUS (Encodes mixed information about genetic sequence data in a block structured format)
- Nimblegen Design File Format
- Nimblegen Gene Data Format
- NMR-STAR (NMR Self-defining Text Archive and Retrieval format)
- nucleotide inFormation binary Format
- ODM (Operational Data Model)
- Open Biomedical Ontology Flat File Format
- PDB (Structures of biomolecules deposited in Protein Data Bank)
- Personal Genome SNP Format
- PHD (Output from the basecalling software Phred)
- phyloXML (XML for evolutionary biology and comparative genomics)
- Pre-Clustering File Format
- Protein InFormation Resource Format
- PRM (Protocol Representation Model (Medical Research))
- PSI-MI XML
- PSI-PAR
- RDML (Real-time PCR Data Markup Language)
- SAM (Sequence Alignment/Map format)
- SCF (Staden chromatogram files used to store data from DNA sequencing)
- SBML (Systems Biology Markup Language used to store biochemical network computational models)
- SDD (Structured Descriptive Data)
- SED-ML (Simulation Experiment Description Markup Language)
- Sequence Alignment Map Format
- SOFT (Simple Omnibus Format in Text)
- spML (Separation Markup Language)
- SRA-XML (Short Read Archive eXtensible Markup Language)
- Standard Flowgram Format
- Stockholm Multiple Alignment Format (Representing multiple sequence alignments)
- SBML (System Biology Markup Language)
- SBGN (Systems Biology Graphical Notation)
- SBRML (Systems Biology Results Markup Language)
- Swiss-Prot (Flatfile format used for protein sequences from the Swiss-Prot database)
- TAIR annotation data Format
- TAPIR (TDWG Access Protocol for Information Retrieval)
- TCS (Taxonomic Concept transfer Schema)
- TraML (Transition Markup Language)
- UniProtKB XML Format
- VCF (Variant Call Format)
- Wiggle Format
Biomedical signals (time series)
- ACQ (AcqKnowledge)
- BCI2000 (The BCI2000 project)
- BioSemi (BDF) data format
- BKR (EEG data format)
- CFWB (Chart Data File Format)
- DICOM-Waveform (An extension of Dicom for storing waveform data)
- ecgML (A markup language for electrocardiogram data acquisition and analysis)
- EDF/EDF+ (European Data Format)
- FEF (File Exchange Format for Vital signs, CEN TS 14271)
- GDF v1.x (General Data Format for biomedical signals - Version 1.x)
- GDF v2.x (The General Data Format for biomedical signals - Version 2.x)
- HL7aECG (Health Level 7 v3 annotated ECG)
- OpenXDF (Open Exchange Data Format)
- SCP-ECG (Standard Communication Protocol for Computer assisted electrocardiography)
- SIGIF (A digital SIGnal Interchange Format)
- WFDB (Format of Physiobank)
Chemical
- CCP4 (X-ray crystallography voxels (electron density))
- CDX (ChemDraw file format)
- CDXML (ChemDraw file format)
- CHM (ChemDraw file format)
- CIF (Crystallographic Information File, standardised by IUCr)
- CML (Chemical markup language)
- CTab (Chemical table file .mol, .sd, .sdf)
- HITRAN (spectroscopic data with one optical/infrared transition per line in the ASCII file (.hit))
- JCAMP (Joint Committee on Atomic and Molecular Physical Data, .dx, .jdx)
- MOL (MDL Molfile)
- MOP (MOPAC format)
- MRC (voxels in cryo-electron microscopy)
- MST ACD/ChemSketch v1 file format
- PDB (Protein Data Bank)
- RPT ACD/ChemSketch v1 file format
- RXN (Reaction file format)
- SK2 (ACD/ChemSketch v2 file format)
- SKC (ISIS/Draw file format)
- SMILES (Simplified molecular input line entry specification, .smi)
- SPC (spectroscopic data)
- Structure Data File (SDF)
- TGF (ISIS/Draw reaction file format)
Chemical data may be distinguished in various ways, including Chemical MIME types.
Ecological
- Darwin Core (Standard for sharing information about biological diversity)
- EML (Ecological Metadata Language), not to be confused with EML (Environmental Markup Language)
Geographic and Geospatial
See also Geospatial
- DEM (Digital Elevation Model)
- DOQ (Digital Orthophotos)
- e00 (ESRI ArcInfo Interchange File)
- FGDC (Content Standard for Digital Geospatial Metadata??)
- GeoTIFF (Geospatial extensions to TIFF)
- GML (Geography Markup Language)
- HDFEOS, HD2, HD4 (Hierarchical Data Format-Earth Observing System)
- KML (KML (formerly Keyhole Markup Language), Version 2.2)
- NDF (National Landsat Archive Production System (NLAPS) Data Format)
- SAIF (Spatial Archive and Interchange Format, Canadian)
- SDTS (Spatial Data Transfer Standard)
- shp and shx (ESRI Shapefile must have components; other optional components as well, see entry)
- MrSID (MrSID- Multi-resolution Seamless Image Database)
- TAB (MapInfo dataset format, must have component)
Mathematical
- graph6, sparse6 (ASCII encoding of Adjacency matrices (.g6, .s6))
- graphML (Graph Markup Language)
- m (MATLAB script file)
- M (Mathematica package file)
- MAT (MATLAB matrix data format)
- MathML
- OPJ (Origin data format)
- Statistica
- WP2 WinPlot
Medical Imaging
- BRIK/HEAD (Voxel data from AFNI programs, dual-file (data and metadata, repectively))
- MGH (uncompressed)
- MGZ (zip-compressed)
- DICOM (Digital Imaging and Communications in Medicine (.dcm))
- MINC (Medical Imaging NetCDF format; since version 2.0, based on HDF5 (.mnc))
- OME-TIFF (Open Microscopy Imaging format)
- OME-XML (Open Microscopy Imaging format)
- OST (Open Spatio-Temporal) (extensible, open alternative for microscope images)
- NII (Neuroimaging Informatics Technology Initiative (NIfTI) voxel data, single-file (combined data and metadata))
- IMG/HDR (ANALYZE or NIfTI voxel data, dual-file (separate data and metadata, respectively))
- gii (NIfTI offspring for brain surface data, single-file (combined data and meta-data) style)
- TRK (Vector data describing tracts of neurons, used by TrackVis)
- SDM (Signed Differential Mapping- brain maps(.sdm))
Oceanographic, Atmospheric and Meteorological
- GRIB (Grid in Binary)
- BUFR (Binary Universal Format Representation)
- IOAPI (netCDF augmented with metadata from the I/O API)
- PP (UK Met Office format for weather model data)
Physics
- CGNS (Computational Fluid Dynamics General Notation System)
- NeXuS (Common data format for neutron, x-ray and muon science)
- QCDml (Lattice QCD gauge configuration markup language)
Scientific Signal data
- ACQ (AcqKnowledge File Format for Windows)
- BioSemi (BDF) data format
- BKR (EEG data format)
- CFWB (Chart Data File Format)
- EDF (European data format)
- FEF (File Exchange Format for Vital signs)
- GDF (General data formats for biomedical signals)
- GMS (Gesture And Motion Signal format)
- IROCK (intelliRock Sensor Data File Format)
- MFER (Medical waveform Format Encoding Rules)
- REC (ATI Vision recorder file)
- SCP-ECG (Standard Communication Protocol for Computer assisted electrocardiography)
- SEG Y (Reflection seismology data format)
- SIGIF (SIGnal Interchange Format)
Social Sciences
- Atlas.ti (Computer-assisted qualitative data analysis package)
- DDI (Data Documentation Initiative)
- DO ("DO file" command script for the Stata Statistical package)
- DTA (Binary data file for the Stata Statistical package)
- NVivo (Computer-assisted qualitative data analysis package)
- R (Statistical package)
- SAS (Statistical package)
- SAV (Binary "SPSS data format" for the SPSS Statistical package)
- SPO (Output file for the SPSS Statistical package - version 14)
- SPS ("Syntax file" (plain text command script) for the SPSS Statistical package)
- SPV (Output file for the SPSS Statistical package - version 17 and later)
- Transana (Computer-assisted qualitative data analysis package)