Scientific Data formats
From Just Solve the File Format Problem
				
								
				(Difference between revisions)
				
																
				
				
								
				| Dan Tobias  (Talk | contribs)  (→General) |  (→Microscopy) | ||
| (142 intermediate revisions by 14 users not shown) | |||
| Line 1: | Line 1: | ||
| − | {| | + | {{FormatInfo | 
| − | | | + | |formattype=electronic | 
| − | |  | + | |thiscat=Scientific Data formats | 
| − | |[[ | + | |image=Mad-sci.jpg | 
| − | + | |caption=Mad scientist from 1940 movie | |
| − | + | }} | |
| − | + | ||
| + | See also [[Health and Medicine]] for medical/biomedical data formats, and also see [[Engineering]]. | ||
| == General == | == General == | ||
| − | * [[ | + | * [[Common Data Format]] (CDF) | 
| * [[EAS3]] (binary file format for structured data) | * [[EAS3]] (binary file format for structured data) | ||
| − | * [[ | + | * [[HDF]] (Hierarchical Data Format, originally from NCSA, now maintained by The HDF Group) | 
| + | ** [[HDF4]] | ||
| + | ** [[HDF5]] | ||
| + | * [[IGOR]] (.ibw) | ||
| + | * [[NRRD]] (Nearly Raw Raster Data -- a simple format for n-dimensional raster data) | ||
| * [[NetCDF]] (Network Common Data Format) | * [[NetCDF]] (Network Common Data Format) | ||
| − | *  | + | * [[ROOT]] (CERN data-analysis package and related formats, used in their Open Data initiative) | 
| − | + | ||
| − | + | ||
| − | + | ||
| − | + | ||
| * [[SDXF]] (Structured Data Exchange Format) | * [[SDXF]] (Structured Data Exchange Format) | ||
| − | * [[Silo]] (a storage format for visualization developed at Lawrence Livermore National Laboratory)* [[ | + | * [[Silo]] (a storage format for visualization developed at Lawrence Livermore National Laboratory) | 
| + | * [[Simple Data format]] (SDF) By George H. Fisher, Space Sciences Lab, UC Berkeley (A platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays) | ||
| + | * [[Standard Delay Format]] (SDF) A standard data structure for timing data | ||
| + | * [[XDF (Extensible Data Format)]] [https://en.wikipedia.org/wiki/Extensible_Data_Format] | ||
| * [[XSIL]] (Extensible Scientific Interchange Language) | * [[XSIL]] (Extensible Scientific Interchange Language) | ||
| == Astronomical and Space == | == Astronomical and Space == | ||
| − | * [[ | + | * [[Advanced Scientific Data Format]] | 
| − | * [[ | + | * [[ARN (Astronomical Research Network)]] | 
| + | * [[CPA (PRISM)]] | ||
| + | * [[Flexible Image Transport System]] (FITS) | ||
| + | ** [[PSRFITS]] (Pulsar data storage standard) | ||
| + | * [[ICER]] | ||
| + | * [[NASA Raster Metafile]] | ||
| + | * [[ODL (NASA Object Description Language)]] | ||
| + | * [[PDS]] (Planetary Data System) | ||
| + | * [[PDS4]] | ||
| * [[VOTable]] (IVOA standard table format) | * [[VOTable]] (IVOA standard table format) | ||
| + | * [[SBIG CCDOPS image]] | ||
| + | * [[Standard Archive Format]] (used for USAF missile data) | ||
| * [[Starlink_Data_Format|SDF]] (Starlink Data Format) and [[N-Dimensional_Data_Format|NDF]] (Starlink's Extensible N-Dimensional Data Format). | * [[Starlink_Data_Format|SDF]] (Starlink Data Format) and [[N-Dimensional_Data_Format|NDF]] (Starlink's Extensible N-Dimensional Data Format). | ||
| + | * [[VICAR]] | ||
| + | * [[WinMiPS]] | ||
| == Biological == | == Biological == | ||
| + | * [[23andMe]] | ||
| * [[AB1]] (Chromatogram files used by DNA sequencing instruments from Applied Biosystems) | * [[AB1]] (Chromatogram files used by DNA sequencing instruments from Applied Biosystems) | ||
| * [[ABCD]] (Access to Biological Collection Data) | * [[ABCD]] (Access to Biological Collection Data) | ||
| − | * [[ | + | * [[ABCD#ABCDDNA|ABCDDNA]] (Access to Biological Collection Data DNA extension) | 
| − | * [[ | + | * [[ABCD#ABCDEFG|ABCDEFG]] (Access to Biological Collection Data Extension For Geosciences) | 
| + | * [[ACE (Sequence assembly)|ACE]] (Sequence assembly format) | ||
| * [[Affymetrix Raw Intensity Format]] | * [[Affymetrix Raw Intensity Format]] | ||
| + | * [[AnnData Object]] (.h5ad) | ||
| + | * [[ARF (Axon Raw Format)]] | ||
| * [[ARLEQUIN Project Format]] | * [[ARLEQUIN Project Format]] | ||
| * [[Axt Alignment Format]] | * [[Axt Alignment Format]] | ||
| − | * [[BAM]] (Binary compressed SAM format) | + | * [[BAM (Binary Alignment Map)|BAM]] (Binary compressed SAM format) | 
| * [[BED]] (Browser extensible display format describing genes and other features of DNA sequences) | * [[BED]] (Browser extensible display format describing genes and other features of DNA sequences) | ||
| * [[BEDgraph]] | * [[BEDgraph]] | ||
| Line 46: | Line 66: | ||
| * [[Biological Pathway eXchange]] | * [[Biological Pathway eXchange]] | ||
| * [[BLAT alignment Format]] | * [[BLAT alignment Format]] | ||
| − | * [[BRIX generated O Format]] | + | * [[BRIX generated O Format]]   | 
| − | * [[CAF]] (Common Assembly Format for sequence assembly) | + | * [[CAF (Common Assembly Format)|CAF]] (Common Assembly Format for sequence assembly) | 
| + | * [[CASTEP]] | ||
| * [[CellML]] | * [[CellML]] | ||
| * [[CHADO XML interchange Format]] | * [[CHADO XML interchange Format]] | ||
| Line 55: | Line 76: | ||
| * [[CLUSTAL-W Dendrogram Guide File Format]] | * [[CLUSTAL-W Dendrogram Guide File Format]] | ||
| * [[Clustered Data Table Format]] | * [[Clustered Data Table Format]] | ||
| + | * [[Complete Genomics]] | ||
| + | * [[CRAM]] | ||
| * [[DELTA]] (DEscription Language for TAxonomy) | * [[DELTA]] (DEscription Language for TAxonomy) | ||
| + | * [[DAS]] (Distributed Sequence Annotation System) | ||
| + | * [[DBN]] (Dot Bracket Notation (DBN) - Vienna Format) | ||
| * [[EMBL]] (Flatfile format used by the EMBL for nucleotide and peptide sequences) | * [[EMBL]] (Flatfile format used by the EMBL for nucleotide and peptide sequences) | ||
| − | * [[FASTA and FASTQ]] (File format for sequence data, FASTQ with quality).   | + | * [[EML (Environmental Markup Language)]] not to be confused with [[EML (Ecological Metadata Language)]] | 
| − | * [[GelML]] | + | * [[ENCODE]] (Peak information Format) | 
| + | * [[FASTA and FASTQ]] (File format for sequence data, FASTQ with quality) | ||
| + | * [[FAST5]] (.fast5) | ||
| + | * [[FuGEFlow]] | ||
| + | * [[FuGE-ML]] (Functional Genomics Experiment Markup Language) | ||
| + | * [[Gating-ML]] | ||
| + | * [[GCDML]] (Genomic Contextual Data Markup Language) | ||
| + | * [[GelML]] Gel electrophoresis Markup Language  | ||
| * [[GenBank]] (Flatfile format used by NCBI for nucleotide and peptide sequences) | * [[GenBank]] (Flatfile format used by NCBI for nucleotide and peptide sequences) | ||
| + | * [[Gene Feature File]] (Versions 1 and 3) | ||
| + | * [[Gene Prediction File Format]] | ||
| + | * [[GenePattern GeneSet Table Format]] | ||
| + | * [[Genome Annotation File]] (version 1 and 2) | ||
| + | * [[Genozip]] | ||
| * [[GFF]] (General feature format for describing genes and other features of DNA, RNA and protein sequences) | * [[GFF]] (General feature format for describing genes and other features of DNA, RNA and protein sequences) | ||
| * [[GTF]] (Gene transfer format holds information about gene structure) | * [[GTF]] (Gene transfer format holds information about gene structure) | ||
| + | * [[HMMER]] | ||
| + | * [[ICB]] (ICM binary file Format) | ||
| + | * [[Image Cytometry Experiment]] (ICE) | ||
| + | * [[Image Cytometry Standard]] (ICS) | ||
| + | * [[imzML]] (imaging mz Markup Language) | ||
| + | * [[ISA-Tab]] (Investigation Study Assay Tabular) | ||
| + | * [[ISND sequence record XML]] | ||
| + | * [[KGML]] (KEGG Mark-up Language) | ||
| + | * [[MAGE-Tab]] (MicroArray Gene Expression Tabular) | ||
| + | * [[MCL]] (Microbiological Common Language) | ||
| + | * [[MIARE-TAB]] (Minimum Information About a RNAi Experiment Tabular) | ||
| + | * [[microarray track data Browser Extensible Data Format]] | ||
| + | * [[MINiML]] (MIAME Notation in Markup Language) | ||
| + | * [[mini Protein Data Bank Format]] | ||
| + | * [[MIQAS-TAB]] (Minimal Information for QTLs and Association Studies Tabular) | ||
| * [[MITAB]] | * [[MITAB]] | ||
| + | * [[mmCIF]] (macromolecular Crystallographic Information File) | ||
| + | * [[Multiple Alignment Forma]] | ||
| * [[mzData]] (deprecated) | * [[mzData]] (deprecated) | ||
| * [[mzIdentML]] | * [[mzIdentML]] | ||
| * [[mzML]] | * [[mzML]] | ||
| * [[mzQuantML]] | * [[mzQuantML]] | ||
| + | * [[mzXML]] (deprecated) | ||
| + | * [[NCD]] (Natural Collections Descriptions) | ||
| + | * [[NDTF]] (Neurophysiology Data Translation Format) | ||
| + | * [[net alignment annotation Format]] | ||
| + | * [[NeuroML]] (Neuroscience eXtensible Markup Language) | ||
| + | * [[New Hampshire eXtended Format]] | ||
| + | * [[Newick tree Format]] | ||
| * [[NEXUS]] (Encodes mixed information about genetic sequence data in a block structured format) | * [[NEXUS]] (Encodes mixed information about genetic sequence data in a block structured format) | ||
| − | * [[ | + | * [[Nimblegen Design File Format]] | 
| + | * [[Nimblegen Gene Data Format]] | ||
| + | * [[NMR-STAR]] (NMR Self-defining Text Archive and Retrieval format) | ||
| + | * [[nucleotide inFormation binary Format]] | ||
| + | * [[ODM]] (Operational Data Model) | ||
| + | * [[Open Biomedical Ontology Flat File Format]] | ||
| + | * [[Personal Genome SNP Format]] | ||
| * [[PHD]] (Output from the basecalling software Phred) | * [[PHD]] (Output from the basecalling software Phred) | ||
| + | * [[phyloXML]] (XML for evolutionary biology and comparative genomics) | ||
| + | * [[Pre-Clustering File Format]] | ||
| + | * [[Protein Data Bank]] (PDB; Structures of biomolecules deposited in Protein Data Bank) | ||
| + | * [[Protein InFormation Resource Format]] | ||
| + | * [[PRM]] (Protocol Representation Model (Medical Research)) | ||
| * [[PSI-MI XML]] | * [[PSI-MI XML]] | ||
| * [[PSI-PAR]] | * [[PSI-PAR]] | ||
| + | * [[RDML]] (Real-time PCR Data Markup Language) | ||
| * [[SAM]] (Sequence Alignment/Map format) | * [[SAM]] (Sequence Alignment/Map format) | ||
| * [[SCF]] (Staden chromatogram files used to store data from DNA sequencing) | * [[SCF]] (Staden chromatogram files used to store data from DNA sequencing) | ||
| * [[SBML]] (Systems Biology Markup Language used to store biochemical network computational models) | * [[SBML]] (Systems Biology Markup Language used to store biochemical network computational models) | ||
| − | * [[spML]] | + | * [[SDD]] (Structured Descriptive Data) | 
| − | * [[Stockholm]] (Representing multiple sequence alignments) | + | * [[SED-ML]] (Simulation Experiment Description Markup Language) | 
| + | * [[SOFT]] (Simple Omnibus Format in Text) | ||
| + | * [[spML]] (Separation Markup Language) | ||
| + | * [[SRA-XML]] (Short Read Archive eXtensible Markup Language) | ||
| + | * [[Standard Flowgram Format]] | ||
| + | * [[Stockholm Multiple Alignment Format]] (Representing multiple sequence alignments) | ||
| + | * [[SBML]] (System Biology Markup Language) | ||
| + | * [[SBGN]] (Systems Biology Graphical Notation) | ||
| + | * [[SBRML]] (Systems Biology Results Markup Language) | ||
| * [[Swiss-Prot]] (Flatfile format used for protein sequences from the Swiss-Prot database) | * [[Swiss-Prot]] (Flatfile format used for protein sequences from the Swiss-Prot database) | ||
| − | * [[TraML]] | + | * [[TAIR annotation data Format]] | 
| + | * [[TAPIR]] (TDWG Access Protocol for Information Retrieval) | ||
| + | * [[TCS]] (Taxonomic Concept transfer Schema) | ||
| + | * [[TraML]] (Transition Markup Language) | ||
| + | * [[UniProtKB XML Format]] | ||
| * [[VCF]] (Variant Call Format) | * [[VCF]] (Variant Call Format) | ||
| − | + | * [[Wiggle Format]] | |
| − | + | ||
| − | + | ||
| − | * [[ | + | |
| − | + | ||
| − | + | ||
| − | + | ||
| − | + | ||
| − | + | ||
| − | + | ||
| − | + | ||
| − | + | ||
| − | + | ||
| − | + | ||
| − | + | ||
| − | + | ||
| − | + | ||
| − | + | ||
| − | + | ||
| == Chemical == | == Chemical == | ||
| * [[CCP4]] (X-ray crystallography voxels (electron density)) | * [[CCP4]] (X-ray crystallography voxels (electron density)) | ||
| − | * [[CHM]] (ChemDraw file format) | + | * [[CDX (ChemDraw Exchange)|CDX]] (ChemDraw file format) | 
| + | * [[CDXML]] (ChemDraw file format) | ||
| + | * [[CHM (ChemDraw)|CHM]] (ChemDraw file format) | ||
| * [[CIF]] (Crystallographic Information File, standardised by IUCr) | * [[CIF]] (Crystallographic Information File, standardised by IUCr) | ||
| * [[CML]] (Chemical markup language) | * [[CML]] (Chemical markup language) | ||
| Line 111: | Line 180: | ||
| * [[MOP]] (MOPAC format) | * [[MOP]] (MOPAC format) | ||
| * [[MRC]] (voxels in cryo-electron microscopy) | * [[MRC]] (voxels in cryo-electron microscopy) | ||
| − | * [[ | + | * [[MST]] ACD/ChemSketch v1 file format | 
| + | * [[Protein Data Bank]] (PDB) | ||
| + | * [[RPT (OpenLynx)]] Waters OpenLynx reports | ||
| + | * [[RXN]] (Reaction file format) | ||
| + | * [[SK2]] (ACD/ChemSketch v2 file format) | ||
| + | * [[SKC]] (ISIS/Draw file format) | ||
| * [[SMILES]] (Simplified molecular input line entry specification, .smi) | * [[SMILES]] (Simplified molecular input line entry specification, .smi) | ||
| − | * [[SPC | + | * [[SPC (Spectroscopic Data)]] | 
| * [[Structure Data File]] (SDF) | * [[Structure Data File]] (SDF) | ||
| + | * [[TGF]] (ISIS/Draw reaction file format) | ||
| + | * [[XYZ Chem]] [https://en.wikipedia.org/wiki/XYZ_file_format Wiki] | ||
| Chemical data may be distinguished in various ways, including [http://www.ch.ic.ac.uk/chemime/ Chemical MIME] types. | Chemical data may be distinguished in various ways, including [http://www.ch.ic.ac.uk/chemime/ Chemical MIME] types. | ||
| + | |||
| + | == Earth Sciences == | ||
| + | * [[Adaptable Seismic Data Format]] | ||
| + | * [[Network-Day Tape]] | ||
| + | * [[QuakeML]] | ||
| + | * [[SEED]] | ||
| + | * [[SEG-D]] (formats, mostly tape based, for seismic data) | ||
| + | * [[SEG Y]] (Reflection seismology data format) | ||
| + | * [[SEIS-PROV]] | ||
| + | * [[StationXML]] | ||
| == Ecological == | == Ecological == | ||
| * [[Darwin Core]] (Standard for sharing information about biological diversity) | * [[Darwin Core]] (Standard for sharing information about biological diversity) | ||
| − | * [[ | + | * [[Electronic Data Deliverable]] (EDD; EPA Superfund) | 
| + | * [[EML (Ecological Metadata Language)]], not to be confused with [[EML (Environmental Markup Language)]] | ||
| + | |||
| + | == Environmental == | ||
| + | * [[HYT]] (AquiferTest) | ||
| == Geographic and Geospatial == | == Geographic and Geospatial == | ||
| Line 132: | Line 222: | ||
| * [[GeoTIFF]] (Geospatial extensions to TIFF) | * [[GeoTIFF]] (Geospatial extensions to TIFF) | ||
| * [[GML]] (Geography Markup Language) | * [[GML]] (Geography Markup Language) | ||
| − | * [[ | + | * [[HDF-EOS]] (Hierarchical Data Format-Earth Observing System)[https://hdfeos.org/ 1] (HD2, HD4, HD5) | 
| * [[KML]] (KML (formerly Keyhole Markup Language), Version 2.2) | * [[KML]] (KML (formerly Keyhole Markup Language), Version 2.2) | ||
| * [[NDF]] (National Landsat Archive Production System (NLAPS) Data Format) | * [[NDF]] (National Landsat Archive Production System (NLAPS) Data Format) | ||
| * [[SAIF]] (Spatial Archive and Interchange Format, Canadian) | * [[SAIF]] (Spatial Archive and Interchange Format, Canadian) | ||
| * [[SDTS]] (Spatial Data Transfer Standard) | * [[SDTS]] (Spatial Data Transfer Standard) | ||
| − | * [[ | + | * [[Shapefile]] (ESRI, shp/shx) | 
| − | * [[ | + | * [[MrSID]] (MrSID- Multi-resolution Seamless Image Database) | 
| * [[TAB]] (MapInfo dataset format, must have component) | * [[TAB]] (MapInfo dataset format, must have component) | ||
| + | * [[Bathymetric Attributed Grid]] (.bag) | ||
| == Mathematical == | == Mathematical == | ||
| + | * [[AsciiMath]] | ||
| + | * [[DOT (graph description language)]] | ||
| + | * [[GEXF]] (Graph Exchange XML Format) | ||
| * [[graph6, sparse6]] (ASCII encoding of Adjacency matrices (.g6, .s6)) | * [[graph6, sparse6]] (ASCII encoding of Adjacency matrices (.g6, .s6)) | ||
| − | * [[ | + | * [[graphML]] (Graph Markup Language) | 
| − | * [[ | + | * GraphPad Prism | 
| + | ** [[PZM]] | ||
| + | ** [[PZF]] | ||
| + | ** [[PZFX]] | ||
| + | ** [[PRISM]] | ||
| + | * [[JMP]] (.jmp) | ||
| + | * [[KaleidaGraph]] (.qda, .qdc) | ||
| + | * [[Life 1.05]] | ||
| + | * [[Life 1.06]] | ||
| + | * [[MacWavelets]] | ||
| + | * Mathematica | ||
| + | ** [[Computable Document Format]] (.cdf) | ||
| + | ** [[Mathematica notebook]] (.nb, .nbp) | ||
| + | ** [[Mathematica package file]] (M) | ||
| + | ** [[Wolfram Language]] | ||
| + | * [[Macrocell]] | ||
| + | * [[MCell]] | ||
| * [[MathML]] | * [[MathML]] | ||
| + | * MATLAB | ||
| + | ** [[MAT]] (MATLAB data format) | ||
| + | ** [[Matlab figure]] | ||
| + | ** [[MATLAB script file]] (m) | ||
| + | ** [[Matlab Model]] (.mdl, .slx) | ||
| + | * [[Minitab]] (.mtw, .mpj) | ||
| + | * [[NPY and NPZ (NumPy)]] | ||
| + | * [[OPJ]] (Origin data format) | ||
| + | * [[PDL]] (Perl Data Language) | ||
| + | * [[Plaintext (cellular automata)]] | ||
| + | * [[RLE (cellular automata)]] | ||
| + | * [[Rule (Golly)]] | ||
| + | * [[Small Object Format]] | ||
| + | * [[Statistica]] | ||
| + | ** [[CSS Software]] (Complete Statistical System) | ||
| + | ** [[CSS STATISTICA]] | ||
| * [[WP2]] WinPlot | * [[WP2]] WinPlot | ||
| − | ==  | + | == Microscopy ==   | 
| − | * [[ | + | |
| − | *  | + | * [[Amber ARR Bitmap Image]] | 
| − | *  | + | * [[Aperio SVS]] | 
| − | * [[ | + | * [[Bio]] | 
| − | * [[ | + | * [[BioRad confocal image]] | 
| − | * [[ | + | * [[CZI]] (Zeiss) [https://www.zeiss.com/microscopy/us/products/software/zeiss-zen/czi-image-file-format.html] | 
| + | * [[DeltaVision]] | ||
| + | * [[DM2]] (Gatan Digital Micrograph 2) | ||
| + | * [[DM3]] (Gatan Digital Micrograph 3) | ||
| + | * [[DM4]] (Gatan Digital Micrograph 4) | ||
| + | * [[GATAN]] | ||
| + | * [[HMSA]] (.msa) | ||
| + | * [[Image Cytometry Experiment]] (ICE) | ||
| + | * [[Image Cytometry Standard]] (ICS) | ||
| + | * [[KONTRON]] | ||
| + | * [[LIFF]] (Openlab Layered Image File Format)   | ||
| + | * [[LSM]] (Zeiss Light Speed Microscope)  | ||
| + | * [[MetaMorph Stack]] (.stk) | ||
| + | * [[MRC]] (Medical Research Council) | ||
| * [[OME-TIFF]] (Open Microscopy Imaging format) | * [[OME-TIFF]] (Open Microscopy Imaging format) | ||
| * [[OME-XML]] (Open Microscopy Imaging format) | * [[OME-XML]] (Open Microscopy Imaging format) | ||
| − | * [[ | + | * [[SMV]] | 
| − | * [[ | + | * [[VGS-8]] | 
| − | * [[ | + | * [[Zeiss BIVAS]] | 
| − | * [[ | + | |
| − | * [[ | + | == Neutron and X-ray Scattering == | 
| + | |||
| + | * [[canSAS]] (tools for small-angle scattering) | ||
| + | * [[CIF]] (Crystallographic Information File, standardised by IUCr) | ||
| + | * [[NeXus]] (NeXus is a common data format for neutron, x-ray, and muon science) | ||
| == Oceanographic, Atmospheric and Meteorological == | == Oceanographic, Atmospheric and Meteorological == | ||
| − | * [[GRIB]] ( | + | * [[GRIB]] (Gridded Binary) | 
| * [[BUFR]] (Binary Universal Format Representation) | * [[BUFR]] (Binary Universal Format Representation) | ||
| * [[IOAPI]] (netCDF augmented with metadata from the I/O API) | * [[IOAPI]] (netCDF augmented with metadata from the I/O API) | ||
| + | * [[Meteosat data]] | ||
| * [[PP]] (UK Met Office format for weather model data) | * [[PP]] (UK Met Office format for weather model data) | ||
| == Physics == | == Physics == | ||
| − | + | See subcategory [[Physics data]] | |
| − | + | ||
| − | + | ||
| == Scientific Signal data == | == Scientific Signal data == | ||
| Line 184: | Line 326: | ||
| * [[EDF]] (European data format) | * [[EDF]] (European data format) | ||
| * [[FEF]] (File Exchange Format for Vital signs) | * [[FEF]] (File Exchange Format for Vital signs) | ||
| − | * [[ | + | * [[General Data Format for Biosignals]] (GDF) | 
| * [[GMS]] (Gesture And Motion Signal format) | * [[GMS]] (Gesture And Motion Signal format) | ||
| * [[IROCK]] (intelliRock Sensor Data File Format) | * [[IROCK]] (intelliRock Sensor Data File Format) | ||
| Line 190: | Line 332: | ||
| * [[REC]] (ATI Vision recorder file) | * [[REC]] (ATI Vision recorder file) | ||
| * [[SCP-ECG]] (Standard Communication Protocol for Computer assisted electrocardiography) | * [[SCP-ECG]] (Standard Communication Protocol for Computer assisted electrocardiography) | ||
| − | |||
| * [[SIGIF]] (SIGnal Interchange Format) | * [[SIGIF]] (SIGnal Interchange Format) | ||
| == Social Sciences == | == Social Sciences == | ||
| − | * [[DDI]] (Data Documentation Initiative) | + | * [[Atlas.ti]] ([[Computer-assisted qualitative data analysis]] package) | 
| + | * [[DDI (Data Documentation Initiative)|DDI]] (Data Documentation Initiative) | ||
| + | * [[DO]] ("DO file" command script for the [[Stata]] Statistical package) | ||
| + | * [[DTA]] (Binary data file for the [[Stata]] Statistical package) | ||
| + | * [[Linguistic Annotation Framework]] (LAF; used by computational linguists to annotate language samples)  | ||
| + | * [[M2k]] (MAXQDA) | ||
| + | * [[NVivo]] ([[Computer-assisted qualitative data analysis]] package) | ||
| + | * [[R]] (Statistical package) | ||
| * [[SAS]] (Statistical package) | * [[SAS]] (Statistical package) | ||
| − | * [[SPSS]] (Statistical package) | + | ** [[SAS Transport File]] (.xpt) | 
| − | * [[ | + | * [[SAV]] (Binary "[[SPSS]] data format" for the [[SPSS]] Statistical package) | 
| + | * [[SPO]] (Output file for the [[SPSS]] Statistical package - version 14) | ||
| + | * [[SPS]] ("Syntax file" (plain text command script) for the [[SPSS]] Statistical package) | ||
| + | * [[SPV]] (Output file for the [[SPSS]] Statistical package - version 17 and later) | ||
| + | * [[Statistix]] (.sx) | ||
| + | * [[Transana]] ([[Computer-assisted qualitative data analysis]] package) | ||
| + | |||
| + | == Spectra == | ||
| + | * [[Bruker]] (XRF software, .pdz) | ||
| + | * [[Niton]] (XRF software, .ndt) | ||
| + | * [[EDAX Spectrum]] (.spc) | ||
| + | * [[Thermo Scientific SPC]] (.spc) | ||
| + | * [[EMSA/MAS]] | ||
| + | * [[HMSA Hyper-Dimensional Data]] | ||
| + | |||
| + | == Miscellaneous == | ||
| + | |||
| + | * [[AIML]] (Artificial Intelligence Markup Language) | ||
| + | * [[EMD-DF64]] (used for high frequency energy monitoring) | ||
| + | * [[IES]] (IESNA LM-63 Photometric Data File) | ||
| + | * [[Jupyter Notebook]] (.ipynb) | ||
| + | |||
| + | == Links == | ||
| + | * [http://cameronneylon.net/blog/improving-on-access-to-research/ Improving on “Access to Research”] | ||
| + | * [[WikiBooks:Software Tools For Molecular Microscopy]] | ||
Latest revision as of 12:14, 22 October 2025
See also Health and Medicine for medical/biomedical data formats, and also see Engineering.
[edit] General
- Common Data Format (CDF)
- EAS3 (binary file format for structured data)
- HDF (Hierarchical Data Format, originally from NCSA, now maintained by The HDF Group)
- IGOR (.ibw)
- NRRD (Nearly Raw Raster Data -- a simple format for n-dimensional raster data)
- NetCDF (Network Common Data Format)
- ROOT (CERN data-analysis package and related formats, used in their Open Data initiative)
- SDXF (Structured Data Exchange Format)
- Silo (a storage format for visualization developed at Lawrence Livermore National Laboratory)
- Simple Data format (SDF) By George H. Fisher, Space Sciences Lab, UC Berkeley (A platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays)
- Standard Delay Format (SDF) A standard data structure for timing data
- XDF (Extensible Data Format) [1]
- XSIL (Extensible Scientific Interchange Language)
[edit] Astronomical and Space
- Advanced Scientific Data Format
- ARN (Astronomical Research Network)
- CPA (PRISM)
-  Flexible Image Transport System (FITS)
- PSRFITS (Pulsar data storage standard)
 
- ICER
- NASA Raster Metafile
- ODL (NASA Object Description Language)
- PDS (Planetary Data System)
- PDS4
- VOTable (IVOA standard table format)
- SBIG CCDOPS image
- Standard Archive Format (used for USAF missile data)
- SDF (Starlink Data Format) and NDF (Starlink's Extensible N-Dimensional Data Format).
- VICAR
- WinMiPS
[edit] Biological
- 23andMe
- AB1 (Chromatogram files used by DNA sequencing instruments from Applied Biosystems)
- ABCD (Access to Biological Collection Data)
- ABCDDNA (Access to Biological Collection Data DNA extension)
- ABCDEFG (Access to Biological Collection Data Extension For Geosciences)
- ACE (Sequence assembly format)
- Affymetrix Raw Intensity Format
- AnnData Object (.h5ad)
- ARF (Axon Raw Format)
- ARLEQUIN Project Format
- Axt Alignment Format
- BAM (Binary compressed SAM format)
- BED (Browser extensible display format describing genes and other features of DNA sequences)
- BEDgraph
- Big Browser Extensible Data Format
- Big Wiggle Format
- Binary Alignement Map Format
- Binary Probe Map Format
- Binary sequence information Format
- Biological Pathway eXchange
- BLAT alignment Format
- BRIX generated O Format
- CAF (Common Assembly Format for sequence assembly)
- CASTEP
- CellML
- CHADO XML interchange Format
- Chain Format for pairwise alignment
- CHARMM Card File Format
- CLUSTAL-W Alignment Format
- CLUSTAL-W Dendrogram Guide File Format
- Clustered Data Table Format
- Complete Genomics
- CRAM
- DELTA (DEscription Language for TAxonomy)
- DAS (Distributed Sequence Annotation System)
- DBN (Dot Bracket Notation (DBN) - Vienna Format)
- EMBL (Flatfile format used by the EMBL for nucleotide and peptide sequences)
- EML (Environmental Markup Language) not to be confused with EML (Ecological Metadata Language)
- ENCODE (Peak information Format)
- FASTA and FASTQ (File format for sequence data, FASTQ with quality)
- FAST5 (.fast5)
- FuGEFlow
- FuGE-ML (Functional Genomics Experiment Markup Language)
- Gating-ML
- GCDML (Genomic Contextual Data Markup Language)
- GelML Gel electrophoresis Markup Language
- GenBank (Flatfile format used by NCBI for nucleotide and peptide sequences)
- Gene Feature File (Versions 1 and 3)
- Gene Prediction File Format
- GenePattern GeneSet Table Format
- Genome Annotation File (version 1 and 2)
- Genozip
- GFF (General feature format for describing genes and other features of DNA, RNA and protein sequences)
- GTF (Gene transfer format holds information about gene structure)
- HMMER
- ICB (ICM binary file Format)
- Image Cytometry Experiment (ICE)
- Image Cytometry Standard (ICS)
- imzML (imaging mz Markup Language)
- ISA-Tab (Investigation Study Assay Tabular)
- ISND sequence record XML
- KGML (KEGG Mark-up Language)
- MAGE-Tab (MicroArray Gene Expression Tabular)
- MCL (Microbiological Common Language)
- MIARE-TAB (Minimum Information About a RNAi Experiment Tabular)
- microarray track data Browser Extensible Data Format
- MINiML (MIAME Notation in Markup Language)
- mini Protein Data Bank Format
- MIQAS-TAB (Minimal Information for QTLs and Association Studies Tabular)
- MITAB
- mmCIF (macromolecular Crystallographic Information File)
- Multiple Alignment Forma
- mzData (deprecated)
- mzIdentML
- mzML
- mzQuantML
- mzXML (deprecated)
- NCD (Natural Collections Descriptions)
- NDTF (Neurophysiology Data Translation Format)
- net alignment annotation Format
- NeuroML (Neuroscience eXtensible Markup Language)
- New Hampshire eXtended Format
- Newick tree Format
- NEXUS (Encodes mixed information about genetic sequence data in a block structured format)
- Nimblegen Design File Format
- Nimblegen Gene Data Format
- NMR-STAR (NMR Self-defining Text Archive and Retrieval format)
- nucleotide inFormation binary Format
- ODM (Operational Data Model)
- Open Biomedical Ontology Flat File Format
- Personal Genome SNP Format
- PHD (Output from the basecalling software Phred)
- phyloXML (XML for evolutionary biology and comparative genomics)
- Pre-Clustering File Format
- Protein Data Bank (PDB; Structures of biomolecules deposited in Protein Data Bank)
- Protein InFormation Resource Format
- PRM (Protocol Representation Model (Medical Research))
- PSI-MI XML
- PSI-PAR
- RDML (Real-time PCR Data Markup Language)
- SAM (Sequence Alignment/Map format)
- SCF (Staden chromatogram files used to store data from DNA sequencing)
- SBML (Systems Biology Markup Language used to store biochemical network computational models)
- SDD (Structured Descriptive Data)
- SED-ML (Simulation Experiment Description Markup Language)
- SOFT (Simple Omnibus Format in Text)
- spML (Separation Markup Language)
- SRA-XML (Short Read Archive eXtensible Markup Language)
- Standard Flowgram Format
- Stockholm Multiple Alignment Format (Representing multiple sequence alignments)
- SBML (System Biology Markup Language)
- SBGN (Systems Biology Graphical Notation)
- SBRML (Systems Biology Results Markup Language)
- Swiss-Prot (Flatfile format used for protein sequences from the Swiss-Prot database)
- TAIR annotation data Format
- TAPIR (TDWG Access Protocol for Information Retrieval)
- TCS (Taxonomic Concept transfer Schema)
- TraML (Transition Markup Language)
- UniProtKB XML Format
- VCF (Variant Call Format)
- Wiggle Format
[edit] Chemical
- CCP4 (X-ray crystallography voxels (electron density))
- CDX (ChemDraw file format)
- CDXML (ChemDraw file format)
- CHM (ChemDraw file format)
- CIF (Crystallographic Information File, standardised by IUCr)
- CML (Chemical markup language)
- CTab (Chemical table file .mol, .sd, .sdf)
- HITRAN (spectroscopic data with one optical/infrared transition per line in the ASCII file (.hit))
- JCAMP (Joint Committee on Atomic and Molecular Physical Data, .dx, .jdx)
- MOL (MDL Molfile)
- MOP (MOPAC format)
- MRC (voxels in cryo-electron microscopy)
- MST ACD/ChemSketch v1 file format
- Protein Data Bank (PDB)
- RPT (OpenLynx) Waters OpenLynx reports
- RXN (Reaction file format)
- SK2 (ACD/ChemSketch v2 file format)
- SKC (ISIS/Draw file format)
- SMILES (Simplified molecular input line entry specification, .smi)
- SPC (Spectroscopic Data)
- Structure Data File (SDF)
- TGF (ISIS/Draw reaction file format)
- XYZ Chem Wiki
Chemical data may be distinguished in various ways, including Chemical MIME types.
[edit] Earth Sciences
- Adaptable Seismic Data Format
- Network-Day Tape
- QuakeML
- SEED
- SEG-D (formats, mostly tape based, for seismic data)
- SEG Y (Reflection seismology data format)
- SEIS-PROV
- StationXML
[edit] Ecological
- Darwin Core (Standard for sharing information about biological diversity)
- Electronic Data Deliverable (EDD; EPA Superfund)
- EML (Ecological Metadata Language), not to be confused with EML (Environmental Markup Language)
[edit] Environmental
- HYT (AquiferTest)
[edit] Geographic and Geospatial
See also Geospatial
- DEM (Digital Elevation Model)
- DOQ (Digital Orthophotos)
- e00 (ESRI ArcInfo Interchange File)
- FGDC (Content Standard for Digital Geospatial Metadata??)
- GeoTIFF (Geospatial extensions to TIFF)
- GML (Geography Markup Language)
- HDF-EOS (Hierarchical Data Format-Earth Observing System)1 (HD2, HD4, HD5)
- KML (KML (formerly Keyhole Markup Language), Version 2.2)
- NDF (National Landsat Archive Production System (NLAPS) Data Format)
- SAIF (Spatial Archive and Interchange Format, Canadian)
- SDTS (Spatial Data Transfer Standard)
- Shapefile (ESRI, shp/shx)
- MrSID (MrSID- Multi-resolution Seamless Image Database)
- TAB (MapInfo dataset format, must have component)
- Bathymetric Attributed Grid (.bag)
[edit] Mathematical
- AsciiMath
- DOT (graph description language)
- GEXF (Graph Exchange XML Format)
- graph6, sparse6 (ASCII encoding of Adjacency matrices (.g6, .s6))
- graphML (Graph Markup Language)
- GraphPad Prism
- JMP (.jmp)
- KaleidaGraph (.qda, .qdc)
- Life 1.05
- Life 1.06
- MacWavelets
-  Mathematica
- Computable Document Format (.cdf)
- Mathematica notebook (.nb, .nbp)
- Mathematica package file (M)
- Wolfram Language
 
- Macrocell
- MCell
- MathML
-  MATLAB
- MAT (MATLAB data format)
- Matlab figure
- MATLAB script file (m)
- Matlab Model (.mdl, .slx)
 
- Minitab (.mtw, .mpj)
- NPY and NPZ (NumPy)
- OPJ (Origin data format)
- PDL (Perl Data Language)
- Plaintext (cellular automata)
- RLE (cellular automata)
- Rule (Golly)
- Small Object Format
-  Statistica
- CSS Software (Complete Statistical System)
- CSS STATISTICA
 
- WP2 WinPlot
[edit] Microscopy
- Amber ARR Bitmap Image
- Aperio SVS
- Bio
- BioRad confocal image
- CZI (Zeiss) [2]
- DeltaVision
- DM2 (Gatan Digital Micrograph 2)
- DM3 (Gatan Digital Micrograph 3)
- DM4 (Gatan Digital Micrograph 4)
- GATAN
- HMSA (.msa)
- Image Cytometry Experiment (ICE)
- Image Cytometry Standard (ICS)
- KONTRON
- LIFF (Openlab Layered Image File Format)
- LSM (Zeiss Light Speed Microscope)
- MetaMorph Stack (.stk)
- MRC (Medical Research Council)
- OME-TIFF (Open Microscopy Imaging format)
- OME-XML (Open Microscopy Imaging format)
- SMV
- VGS-8
- Zeiss BIVAS
[edit] Neutron and X-ray Scattering
- canSAS (tools for small-angle scattering)
- CIF (Crystallographic Information File, standardised by IUCr)
- NeXus (NeXus is a common data format for neutron, x-ray, and muon science)
[edit] Oceanographic, Atmospheric and Meteorological
- GRIB (Gridded Binary)
- BUFR (Binary Universal Format Representation)
- IOAPI (netCDF augmented with metadata from the I/O API)
- Meteosat data
- PP (UK Met Office format for weather model data)
[edit] Physics
See subcategory Physics data
[edit] Scientific Signal data
- ACQ (AcqKnowledge File Format for Windows)
- BioSemi (BDF) data format
- BKR (EEG data format)
- CFWB (Chart Data File Format)
- EDF (European data format)
- FEF (File Exchange Format for Vital signs)
- General Data Format for Biosignals (GDF)
- GMS (Gesture And Motion Signal format)
- IROCK (intelliRock Sensor Data File Format)
- MFER (Medical waveform Format Encoding Rules)
- REC (ATI Vision recorder file)
- SCP-ECG (Standard Communication Protocol for Computer assisted electrocardiography)
- SIGIF (SIGnal Interchange Format)
[edit] Social Sciences
- Atlas.ti (Computer-assisted qualitative data analysis package)
- DDI (Data Documentation Initiative)
- DO ("DO file" command script for the Stata Statistical package)
- DTA (Binary data file for the Stata Statistical package)
- Linguistic Annotation Framework (LAF; used by computational linguists to annotate language samples)
- M2k (MAXQDA)
- NVivo (Computer-assisted qualitative data analysis package)
- R (Statistical package)
-  SAS (Statistical package)
- SAS Transport File (.xpt)
 
- SAV (Binary "SPSS data format" for the SPSS Statistical package)
- SPO (Output file for the SPSS Statistical package - version 14)
- SPS ("Syntax file" (plain text command script) for the SPSS Statistical package)
- SPV (Output file for the SPSS Statistical package - version 17 and later)
- Statistix (.sx)
- Transana (Computer-assisted qualitative data analysis package)
[edit] Spectra
- Bruker (XRF software, .pdz)
- Niton (XRF software, .ndt)
- EDAX Spectrum (.spc)
- Thermo Scientific SPC (.spc)
- EMSA/MAS
- HMSA Hyper-Dimensional Data
[edit] Miscellaneous
- AIML (Artificial Intelligence Markup Language)
- EMD-DF64 (used for high frequency energy monitoring)
- IES (IESNA LM-63 Photometric Data File)
- Jupyter Notebook (.ipynb)


