Scientific Data formats

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
(Biological: Adding HUP and BioSharing formats)
(Microscopy)
 
(145 intermediate revisions by 15 users not shown)
Line 1: Line 1:
{|
+
{{FormatInfo
|[[File Formats]]
+
|formattype=electronic
| >
+
|thiscat=Scientific Data formats
|[[Electronic File Formats]]
+
|image=Mad-sci.jpg
| >
+
|caption=Mad scientist from 1940 movie
|[[Scientific Data formats]]
+
}}
|}
+
 
 +
See also [[Health and Medicine]] for medical/biomedical data formats, and also see [[Engineering]].
  
 
== General ==
 
== General ==
* [[cdf]] (Common Data Format)
+
* [[Common Data Format]] (CDF)
 
* [[EAS3]] (binary file format for structured data)
 
* [[EAS3]] (binary file format for structured data)
* [[hdf]] (Hierarchical Data Format, from NASA)
+
* [[HDF]] (Hierarchical Data Format, originally from NCSA, now maintained by The HDF Group)
 +
** [[HDF4]]
 +
** [[HDF5]]
 +
* [[IGOR]] (.ibw)
 +
* [[NRRD]] (Nearly Raw Raster Data -- a simple format for n-dimensional raster data)
 
* [[NetCDF]] (Network Common Data Format)
 
* [[NetCDF]] (Network Common Data Format)
 +
* [[ROOT]] (CERN data-analysis package and related formats, used in their Open Data initiative)
 
* [[SDXF]] (Structured Data Exchange Format)
 
* [[SDXF]] (Structured Data Exchange Format)
* [[Silo]] (a storage format for visualization developed at Lawrence Livermore National Laboratory)* [[XDF]] (eXtensible Data Format)
+
* [[Silo]] (a storage format for visualization developed at Lawrence Livermore National Laboratory)
 
* [[Simple Data format]] (SDF) By George H. Fisher, Space Sciences Lab, UC Berkeley (A platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays)
 
* [[Simple Data format]] (SDF) By George H. Fisher, Space Sciences Lab, UC Berkeley (A platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays)
* [[Simple Data format-DPT]] A new format from the Data Protocols Team for publishing and sharing data
+
* [[Standard Delay Format]] (SDF) A standard data structure for timing data
* [[Standard Delay Format]] A standard data structure for timing data
+
* [[XDF (Extensible Data Format)]] [https://en.wikipedia.org/wiki/Extensible_Data_Format]
* [[Structure Data File]] A file format for a chemical table file
+
 
* [[XSIL]] (Extensible Scientific Interchange Language)
 
* [[XSIL]] (Extensible Scientific Interchange Language)
  
 
== Astronomical and Space ==
 
== Astronomical and Space ==
* [[FITS]] (Flexible Image Transport System)
+
* [[Advanced Scientific Data Format]]
* [[PDS/ODL]] (Planetary Data System)
+
* [[ARN (Astronomical Research Network)]]
 +
* [[CPA (PRISM)]]
 +
* [[Flexible Image Transport System]] (FITS)
 +
** [[PSRFITS]] (Pulsar data storage standard)
 +
* [[ICER]]
 +
* [[NASA Raster Metafile]]
 +
* [[ODL (NASA Object Description Language)]]
 +
* [[PDS]] (Planetary Data System)
 +
* [[PDS4]]
 
* [[VOTable]] (IVOA standard table format)
 
* [[VOTable]] (IVOA standard table format)
 +
* [[SBIG CCDOPS image]]
 +
* [[Standard Archive Format]] (used for USAF missile data)
 
* [[Starlink_Data_Format|SDF]] (Starlink Data Format) and [[N-Dimensional_Data_Format|NDF]] (Starlink's Extensible N-Dimensional Data Format).
 
* [[Starlink_Data_Format|SDF]] (Starlink Data Format) and [[N-Dimensional_Data_Format|NDF]] (Starlink's Extensible N-Dimensional Data Format).
 +
* [[VICAR]]
 +
* [[WinMiPS]]
  
 
== Biological ==
 
== Biological ==
  
 +
* [[23andMe]]
 
* [[AB1]] (Chromatogram files used by DNA sequencing instruments from Applied Biosystems)
 
* [[AB1]] (Chromatogram files used by DNA sequencing instruments from Applied Biosystems)
 
* [[ABCD]] (Access to Biological Collection Data)
 
* [[ABCD]] (Access to Biological Collection Data)
* [[ACE]] (Sequence assembly format)
+
* [[ABCD#ABCDDNA|ABCDDNA]] (Access to Biological Collection Data DNA extension)
* [[ABCDDNA]] (Access to Biological Collection Data DNA extension)
+
* [[ABCD#ABCDEFG|ABCDEFG]] (Access to Biological Collection Data Extension For Geosciences)
 +
* [[ACE (Sequence assembly)|ACE]] (Sequence assembly format)
 
* [[Affymetrix Raw Intensity Format]]
 
* [[Affymetrix Raw Intensity Format]]
 +
* [[AnnData Object]] (.h5ad)
 +
* [[ARF (Axon Raw Format)]]
 
* [[ARLEQUIN Project Format]]
 
* [[ARLEQUIN Project Format]]
 
* [[Axt Alignment Format]]
 
* [[Axt Alignment Format]]
* [[BAM]] (Binary compressed SAM format)
+
* [[BAM (Binary Alignment Map)|BAM]] (Binary compressed SAM format)
 
* [[BED]] (Browser extensible display format describing genes and other features of DNA sequences)
 
* [[BED]] (Browser extensible display format describing genes and other features of DNA sequences)
 
* [[BEDgraph]]
 
* [[BEDgraph]]
Line 45: Line 66:
 
* [[Biological Pathway eXchange]]
 
* [[Biological Pathway eXchange]]
 
* [[BLAT alignment Format]]
 
* [[BLAT alignment Format]]
* [[BRIX generated O Format]]
+
* [[BRIX generated O Format]]  
* [[CAF]] (Common Assembly Format for sequence assembly)
+
* [[CAF (Common Assembly Format)|CAF]] (Common Assembly Format for sequence assembly)
 +
* [[CASTEP]]
 
* [[CellML]]
 
* [[CellML]]
 
* [[CHADO XML interchange Format]]
 
* [[CHADO XML interchange Format]]
Line 54: Line 76:
 
* [[CLUSTAL-W Dendrogram Guide File Format]]
 
* [[CLUSTAL-W Dendrogram Guide File Format]]
 
* [[Clustered Data Table Format]]
 
* [[Clustered Data Table Format]]
 +
* [[Complete Genomics]]
 +
* [[CRAM]]
 
* [[DELTA]] (DEscription Language for TAxonomy)
 
* [[DELTA]] (DEscription Language for TAxonomy)
 +
* [[DAS]] (Distributed Sequence Annotation System)
 +
* [[DBN]] (Dot Bracket Notation (DBN) - Vienna Format)
 
* [[EMBL]] (Flatfile format used by the EMBL for nucleotide and peptide sequences)
 
* [[EMBL]] (Flatfile format used by the EMBL for nucleotide and peptide sequences)
* [[FASTA and FASTQ]] (File format for sequence data, FASTQ with quality).  
+
* [[EML (Environmental Markup Language)]] not to be confused with [[EML (Ecological Metadata Language)]]
* [[GelML]]
+
* [[ENCODE]] (Peak information Format)
 +
* [[FASTA and FASTQ]] (File format for sequence data, FASTQ with quality)
 +
* [[FAST5]] (.fast5)
 +
* [[FuGEFlow]]
 +
* [[FuGE-ML]] (Functional Genomics Experiment Markup Language)
 +
* [[Gating-ML]]
 +
* [[GCDML]] (Genomic Contextual Data Markup Language)
 +
* [[GelML]] Gel electrophoresis Markup Language
 
* [[GenBank]] (Flatfile format used by NCBI for nucleotide and peptide sequences)
 
* [[GenBank]] (Flatfile format used by NCBI for nucleotide and peptide sequences)
 +
* [[Gene Feature File]] (Versions 1 and 3)
 +
* [[Gene Prediction File Format]]
 +
* [[GenePattern GeneSet Table Format]]
 +
* [[Genome Annotation File]] (version 1 and 2)
 +
* [[Genozip]]
 
* [[GFF]] (General feature format for describing genes and other features of DNA, RNA and protein sequences)
 
* [[GFF]] (General feature format for describing genes and other features of DNA, RNA and protein sequences)
 
* [[GTF]] (Gene transfer format holds information about gene structure)
 
* [[GTF]] (Gene transfer format holds information about gene structure)
 +
* [[HMMER]]
 +
* [[ICB]] (ICM binary file Format)
 +
* [[Image Cytometry Experiment]] (ICE)
 +
* [[Image Cytometry Standard]] (ICS)
 +
* [[imzML]] (imaging mz Markup Language)
 +
* [[ISA-Tab]] (Investigation Study Assay Tabular)
 +
* [[ISND sequence record XML]]
 +
* [[KGML]] (KEGG Mark-up Language)
 +
* [[MAGE-Tab]] (MicroArray Gene Expression Tabular)
 +
* [[MCL]] (Microbiological Common Language)
 +
* [[MIARE-TAB]] (Minimum Information About a RNAi Experiment Tabular)
 +
* [[microarray track data Browser Extensible Data Format]]
 +
* [[MINiML]] (MIAME Notation in Markup Language)
 +
* [[mini Protein Data Bank Format]]
 +
* [[MIQAS-TAB]] (Minimal Information for QTLs and Association Studies Tabular)
 
* [[MITAB]]
 
* [[MITAB]]
 +
* [[mmCIF]] (macromolecular Crystallographic Information File)
 +
* [[Multiple Alignment Forma]]
 
* [[mzData]] (deprecated)
 
* [[mzData]] (deprecated)
 
* [[mzIdentML]]
 
* [[mzIdentML]]
 
* [[mzML]]
 
* [[mzML]]
 
* [[mzQuantML]]
 
* [[mzQuantML]]
 +
* [[mzXML]] (deprecated)
 +
* [[NCD]] (Natural Collections Descriptions)
 +
* [[NDTF]] (Neurophysiology Data Translation Format)
 +
* [[net alignment annotation Format]]
 +
* [[NeuroML]] (Neuroscience eXtensible Markup Language)
 +
* [[New Hampshire eXtended Format]]
 +
* [[Newick tree Format]]
 
* [[NEXUS]] (Encodes mixed information about genetic sequence data in a block structured format)
 
* [[NEXUS]] (Encodes mixed information about genetic sequence data in a block structured format)
* [[PDB]] (Structures of biomolecules deposited in Protein Data Bank)
+
* [[Nimblegen Design File Format]]
 +
* [[Nimblegen Gene Data Format]]
 +
* [[NMR-STAR]] (NMR Self-defining Text Archive and Retrieval format)
 +
* [[nucleotide inFormation binary Format]]
 +
* [[ODM]] (Operational Data Model)
 +
* [[Open Biomedical Ontology Flat File Format]]
 +
* [[Personal Genome SNP Format]]
 
* [[PHD]] (Output from the basecalling software Phred)
 
* [[PHD]] (Output from the basecalling software Phred)
 +
* [[phyloXML]] (XML for evolutionary biology and comparative genomics)
 +
* [[Pre-Clustering File Format]]
 +
* [[Protein Data Bank]] (PDB; Structures of biomolecules deposited in Protein Data Bank)
 +
* [[Protein InFormation Resource Format]]
 +
* [[PRM]] (Protocol Representation Model (Medical Research))
 
* [[PSI-MI XML]]
 
* [[PSI-MI XML]]
 
* [[PSI-PAR]]
 
* [[PSI-PAR]]
 +
* [[RDML]] (Real-time PCR Data Markup Language)
 
* [[SAM]] (Sequence Alignment/Map format)
 
* [[SAM]] (Sequence Alignment/Map format)
 
* [[SCF]] (Staden chromatogram files used to store data from DNA sequencing)
 
* [[SCF]] (Staden chromatogram files used to store data from DNA sequencing)
 
* [[SBML]] (Systems Biology Markup Language used to store biochemical network computational models)
 
* [[SBML]] (Systems Biology Markup Language used to store biochemical network computational models)
* [[spML]]
+
* [[SDD]] (Structured Descriptive Data)
* [[Stockholm]] (Representing multiple sequence alignments)
+
* [[SED-ML]] (Simulation Experiment Description Markup Language)
 +
* [[SOFT]] (Simple Omnibus Format in Text)
 +
* [[spML]] (Separation Markup Language)
 +
* [[SRA-XML]] (Short Read Archive eXtensible Markup Language)
 +
* [[Standard Flowgram Format]]
 +
* [[Stockholm Multiple Alignment Format]] (Representing multiple sequence alignments)
 +
* [[SBML]] (System Biology Markup Language)
 +
* [[SBGN]] (Systems Biology Graphical Notation)
 +
* [[SBRML]] (Systems Biology Results Markup Language)
 
* [[Swiss-Prot]] (Flatfile format used for protein sequences from the Swiss-Prot database)
 
* [[Swiss-Prot]] (Flatfile format used for protein sequences from the Swiss-Prot database)
* [[TraML]]
+
* [[TAIR annotation data Format]]
 +
* [[TAPIR]] (TDWG Access Protocol for Information Retrieval)
 +
* [[TCS]] (Taxonomic Concept transfer Schema)
 +
* [[TraML]] (Transition Markup Language)
 +
* [[UniProtKB XML Format]]
 
* [[VCF]] (Variant Call Format)
 
* [[VCF]] (Variant Call Format)
 
+
* [[Wiggle Format]]
== Biomedical signals (time series) ==
+
 
+
* [[ACQ]] (AcqKnowledge)
+
* [[BCI2000]] (The BCI2000 project)
+
* [[BioSemi]] (BDF) data format
+
* [[BKR]] (EEG data format)
+
* [[CFWB]] (Chart Data File Format)
+
* [[DICOM-Waveform]] (An extension of Dicom for storing waveform data)
+
* [[ecgML]] (A markup language for electrocardiogram data acquisition and analysis)
+
* [[EDF/EDF+]] (European Data Format)
+
* [[FEF]] (File Exchange Format for Vital signs, CEN TS 14271)
+
* [[GDF v1.x]] (General Data Format for biomedical signals - Version 1.x)
+
* [[ GDF v2.x]] (The General Data Format for biomedical signals - Version 2.x)
+
* [[ HL7aECG]] (Health Level 7 v3 annotated ECG)
+
* [[OpenXDF]] (Open Exchange Data Format)
+
* [[SCP-ECG]] (Standard Communication Protocol for Computer assisted electrocardiography)
+
* [[SIGIF]] (A digital SIGnal Interchange Format)
+
* [[WFDB]] (Format of Physiobank)
+
  
 
== Chemical ==
 
== Chemical ==
 
* [[CCP4]] (X-ray crystallography voxels (electron density))
 
* [[CCP4]] (X-ray crystallography voxels (electron density))
* [[CHM]] (ChemDraw file format)
+
* [[CDX (ChemDraw Exchange)|CDX]] (ChemDraw file format)
 +
* [[CDXML]] (ChemDraw file format)
 +
* [[CHM (ChemDraw)|CHM]] (ChemDraw file format)
 
* [[CIF]] (Crystallographic Information File, standardised by IUCr)
 
* [[CIF]] (Crystallographic Information File, standardised by IUCr)
 
* [[CML]] (Chemical markup language)
 
* [[CML]] (Chemical markup language)
Line 110: Line 180:
 
* [[MOP]] (MOPAC format)
 
* [[MOP]] (MOPAC format)
 
* [[MRC]] (voxels in cryo-electron microscopy)
 
* [[MRC]] (voxels in cryo-electron microscopy)
* [[PDB]] (Protein Data Bank)
+
* [[MST]] ACD/ChemSketch v1 file format
 +
* [[Protein Data Bank]] (PDB)
 +
* [[RPT (OpenLynx)]] Waters OpenLynx reports
 +
* [[RXN]] (Reaction file format)
 +
* [[SK2]] (ACD/ChemSketch v2 file format)
 +
* [[SKC]] (ISIS/Draw file format)
 
* [[SMILES]] (Simplified molecular input line entry specification, .smi)
 
* [[SMILES]] (Simplified molecular input line entry specification, .smi)
* [[SPC]] (spectroscopic data)
+
* [[SPC (Spectroscopic Data)]]
 
* [[Structure Data File]] (SDF)
 
* [[Structure Data File]] (SDF)
 +
* [[TGF]] (ISIS/Draw reaction file format)
 +
* [[XYZ Chem]] [https://en.wikipedia.org/wiki/XYZ_file_format Wiki]
  
 
Chemical data may be distinguished in various ways, including [http://www.ch.ic.ac.uk/chemime/ Chemical MIME] types.
 
Chemical data may be distinguished in various ways, including [http://www.ch.ic.ac.uk/chemime/ Chemical MIME] types.
 +
 +
== Earth Sciences ==
 +
* [[Adaptable Seismic Data Format]]
 +
* [[Network-Day Tape]]
 +
* [[QuakeML]]
 +
* [[SEED]]
 +
* [[SEG-D]] (formats, mostly tape based, for seismic data)
 +
* [[SEG Y]] (Reflection seismology data format)
 +
* [[SEIS-PROV]]
 +
* [[StationXML]]
  
 
== Ecological ==
 
== Ecological ==
 
* [[Darwin Core]] (Standard for sharing information about biological diversity)
 
* [[Darwin Core]] (Standard for sharing information about biological diversity)
* [[EML]] (Ecological Metadata Language)
+
* [[Electronic Data Deliverable]] (EDD; EPA Superfund)
 +
* [[EML (Ecological Metadata Language)]], not to be confused with [[EML (Environmental Markup Language)]]
 +
 
 +
== Environmental ==
 +
* [[HYT]] (AquiferTest)
  
 
== Geographic and Geospatial ==
 
== Geographic and Geospatial ==
Line 131: Line 222:
 
* [[GeoTIFF]] (Geospatial extensions to TIFF)
 
* [[GeoTIFF]] (Geospatial extensions to TIFF)
 
* [[GML]] (Geography Markup Language)
 
* [[GML]] (Geography Markup Language)
* [[HDFEOS, HD2, HD4]] (Hierarchical Data Format-Earth Observing System)
+
* [[HDF-EOS]] (Hierarchical Data Format-Earth Observing System)[https://hdfeos.org/ 1] (HD2, HD4, HD5)
 
* [[KML]] (KML (formerly Keyhole Markup Language), Version 2.2)
 
* [[KML]] (KML (formerly Keyhole Markup Language), Version 2.2)
 
* [[NDF]] (National Landsat Archive Production System (NLAPS) Data Format)
 
* [[NDF]] (National Landsat Archive Production System (NLAPS) Data Format)
 
* [[SAIF]] (Spatial Archive and Interchange Format, Canadian)
 
* [[SAIF]] (Spatial Archive and Interchange Format, Canadian)
 
* [[SDTS]] (Spatial Data Transfer Standard)
 
* [[SDTS]] (Spatial Data Transfer Standard)
* [[shp and shx]] (ESRI Shaepfile must have components; other optional components as well, see entry)
+
* [[Shapefile]] (ESRI, shp/shx)
* [[SID]] (MrSID- Multi-resolution Seamless Image Database)
+
* [[MrSID]] (MrSID- Multi-resolution Seamless Image Database)
 
* [[TAB]] (MapInfo dataset format, must have component)
 
* [[TAB]] (MapInfo dataset format, must have component)
 +
* [[Bathymetric Attributed Grid]] (.bag)
  
 
== Mathematical ==
 
== Mathematical ==
 +
* [[AsciiMath]]
 +
* [[DOT (graph description language)]]
 +
* [[GEXF]] (Graph Exchange XML Format)
 
* [[graph6, sparse6]] (ASCII encoding of Adjacency matrices (.g6, .s6))
 
* [[graph6, sparse6]] (ASCII encoding of Adjacency matrices (.g6, .s6))
* [[M]] (Mathematica package file)
+
* [[graphML]] (Graph Markup Language)
* [[MAT]] (MATLAB matrix data format)
+
* GraphPad Prism
 +
** [[PZM]]
 +
** [[PZF]]
 +
** [[PZFX]]
 +
** [[PRISM]]
 +
* [[JMP]] (.jmp)
 +
* [[KaleidaGraph]] (.qda, .qdc)
 +
* [[Life 1.05]]
 +
* [[Life 1.06]]
 +
* [[MacWavelets]]
 +
* Mathematica
 +
** [[Computable Document Format]] (.cdf)
 +
** [[Mathematica notebook]] (.nb, .nbp)
 +
** [[Mathematica package file]] (M)
 +
** [[Wolfram Language]]
 +
* [[Macrocell]]
 +
* [[MCell]]
 
* [[MathML]]
 
* [[MathML]]
 +
* MATLAB
 +
** [[MAT]] (MATLAB data format)
 +
** [[Matlab figure]]
 +
** [[MATLAB script file]] (m)
 +
** [[Matlab Model]] (.mdl, .slx)
 +
* [[Minitab]] (.mtw, .mpj)
 +
* [[NPY and NPZ (NumPy)]]
 +
* [[OPJ]] (Origin data format)
 +
* [[PDL]] (Perl Data Language)
 +
* [[Plaintext (cellular automata)]]
 +
* [[RLE (cellular automata)]]
 +
* [[Rule (Golly)]]
 +
* [[Small Object Format]]
 +
* [[Statistica]]
 +
** [[CSS Software]] (Complete Statistical System)
 +
** [[CSS STATISTICA]]
 
* [[WP2]] WinPlot
 
* [[WP2]] WinPlot
  
== Medical Imaging ==
+
== Microscopy ==  
* [[AFNI]] (data, meta-data (.BRIK,.HEAD))
+
 
*       [[MGH]] (uncompressed)
+
* [[Amber ARR Bitmap Image]]
*       [[MGZ]] (zip-compressed)
+
* [[Aperio SVS]]
* [[Analyze data, meta-data]] (.img,.hdr)
+
* [[Bio]]
* [[DICOM]] (Digital Imaging and Communications in Medicine (.dcm))
+
* [[BioRad confocal image]]
* [[MINC]] (Medical Imaging NetCDF format; since version 2.0, based on HDF5 (.mnc))
+
* [[CZI]] (Zeiss) [https://www.zeiss.com/microscopy/us/products/software/zeiss-zen/czi-image-file-format.html]
 +
* [[DeltaVision]]
 +
* [[DM2]] (Gatan Digital Micrograph 2)
 +
* [[DM3]] (Gatan Digital Micrograph 3)
 +
* [[DM4]] (Gatan Digital Micrograph 4)
 +
* [[GATAN]]
 +
* [[HMSA]] (.msa)
 +
* [[Image Cytometry Experiment]] (ICE)
 +
* [[Image Cytometry Standard]] (ICS)
 +
* [[KONTRON]]
 +
* [[LIFF]] (Openlab Layered Image File Format)  
 +
* [[LSM]] (Zeiss Light Speed Microscope)
 +
* [[MetaMorph Stack]] (.stk)
 +
* [[MRC]] (Medical Research Council)
 
* [[OME-TIFF]] (Open Microscopy Imaging format)
 
* [[OME-TIFF]] (Open Microscopy Imaging format)
 
* [[OME-XML]] (Open Microscopy Imaging format)
 
* [[OME-XML]] (Open Microscopy Imaging format)
* [[OST (Open Spatio-Temporal)]] (extensible, open alternative for microscope images)
+
* [[SMV]]
* [[nii]] (Neuroimaging Informatics Technology Initiative (NIfTI) single-file (combined data and meta-data))
+
* [[VGS-8]]
* [[gii]] (NIfTI offspring for brain surface data, single-file (combined data and meta-data) style)
+
* [[Zeiss BIVAS]]
* [[.img,.hdr]] (NIfTI offspring for brain surface data, dual-file (separate data and meta-data, respectively) style)
+
 
* [[SDM]] (Signed Differential Mapping- brain maps(.sdm))
+
== Neutron and X-ray Scattering ==
 +
 
 +
* [[canSAS]] (tools for small-angle scattering)
 +
* [[CIF]] (Crystallographic Information File, standardised by IUCr)
 +
* [[NeXus]] (NeXus is a common data format for neutron, x-ray, and muon science)
  
 
== Oceanographic, Atmospheric and Meteorological ==
 
== Oceanographic, Atmospheric and Meteorological ==
  
* [[GRIB]] (Grid in Binary)
+
* [[GRIB]] (Gridded Binary)
 
* [[BUFR]] (Binary Universal Format Representation)
 
* [[BUFR]] (Binary Universal Format Representation)
 
* [[IOAPI]] (netCDF augmented with metadata from the I/O API)
 
* [[IOAPI]] (netCDF augmented with metadata from the I/O API)
 +
* [[Meteosat data]]
 
* [[PP]] (UK Met Office format for weather model data)
 
* [[PP]] (UK Met Office format for weather model data)
  
 
== Physics ==
 
== Physics ==
  
* [[CGNS]] (Computational Fluid Dynamics General Notation System)
+
See subcategory [[Physics data]]
* [[NeXuS]] (Common data format for neutron, x-ray and muon science)
+
* [[QCDml]] (Lattice QCD gauge configuration markup language)
+
  
 
== Scientific Signal data ==
 
== Scientific Signal data ==
Line 183: Line 326:
 
* [[EDF]] (European data format)
 
* [[EDF]] (European data format)
 
* [[FEF]] (File Exchange Format for Vital signs)
 
* [[FEF]] (File Exchange Format for Vital signs)
* [[GDF]] (General data formats for biomedical signals)
+
* [[General Data Format for Biosignals]] (GDF)
 
* [[GMS]] (Gesture And Motion Signal format)
 
* [[GMS]] (Gesture And Motion Signal format)
 
* [[IROCK]] (intelliRock Sensor Data File Format)
 
* [[IROCK]] (intelliRock Sensor Data File Format)
Line 189: Line 332:
 
* [[REC]] (ATI Vision recorder file)
 
* [[REC]] (ATI Vision recorder file)
 
* [[SCP-ECG]] (Standard Communication Protocol for Computer assisted electrocardiography)
 
* [[SCP-ECG]] (Standard Communication Protocol for Computer assisted electrocardiography)
* [[SEG Y]] (Reflection seismology data format)
 
 
* [[SIGIF]] (SIGnal Interchange Format)
 
* [[SIGIF]] (SIGnal Interchange Format)
  
 
== Social Sciences ==
 
== Social Sciences ==
  
* [[DDI]] (Data Documentation Initiative)
+
* [[Atlas.ti]] ([[Computer-assisted qualitative data analysis]] package)
 +
* [[DDI (Data Documentation Initiative)|DDI]] (Data Documentation Initiative)
 +
* [[DO]] ("DO file" command script for the [[Stata]] Statistical package)
 +
* [[DTA]] (Binary data file for the [[Stata]] Statistical package)
 +
* [[Linguistic Annotation Framework]] (LAF; used by computational linguists to annotate language samples)
 +
* [[M2k]] (MAXQDA)
 +
* [[NVivo]] ([[Computer-assisted qualitative data analysis]] package)
 +
* [[R]] (Statistical package)
 
* [[SAS]] (Statistical package)
 
* [[SAS]] (Statistical package)
* [[SPSS]] (Statistical package)
+
** [[SAS Transport File]] (.xpt)
* [[Stata]] (Statistical package)
+
* [[SAV]] (Binary "[[SPSS]] data format" for the [[SPSS]] Statistical package)
 +
* [[SPO]] (Output file for the [[SPSS]] Statistical package - version 14)
 +
* [[SPS]] ("Syntax file" (plain text command script) for the [[SPSS]] Statistical package)
 +
* [[SPV]] (Output file for the [[SPSS]] Statistical package - version 17 and later)
 +
* [[Statistix]] (.sx)
 +
* [[Transana]] ([[Computer-assisted qualitative data analysis]] package)
 +
 
 +
== Spectra ==
 +
* [[Bruker]] (XRF software, .pdz)
 +
* [[Niton]] (XRF software, .ndt)
 +
* [[EDAX Spectrum]] (.spc)
 +
* [[Thermo Scientific SPC]] (.spc)
 +
* [[EMSA/MAS]]
 +
* [[HMSA Hyper-Dimensional Data]]
 +
 
 +
== Miscellaneous ==
 +
 
 +
* [[AIML]] (Artificial Intelligence Markup Language)
 +
* [[EMD-DF64]] (used for high frequency energy monitoring)
 +
* [[IES]] (IESNA LM-63 Photometric Data File)
 +
* [[Jupyter Notebook]] (.ipynb)
 +
 
 +
== Links ==
 +
* [http://cameronneylon.net/blog/improving-on-access-to-research/ Improving on “Access to Research”]
 +
* [[WikiBooks:Software Tools For Molecular Microscopy]]

Latest revision as of 12:14, 22 October 2025

File Format
Name Scientific Data formats
Ontology

Mad scientist from 1940 movie

Mad scientist from 1940 movie

See also Health and Medicine for medical/biomedical data formats, and also see Engineering.

Contents

[edit] General

  • Common Data Format (CDF)
  • EAS3 (binary file format for structured data)
  • HDF (Hierarchical Data Format, originally from NCSA, now maintained by The HDF Group)
  • IGOR (.ibw)
  • NRRD (Nearly Raw Raster Data -- a simple format for n-dimensional raster data)
  • NetCDF (Network Common Data Format)
  • ROOT (CERN data-analysis package and related formats, used in their Open Data initiative)
  • SDXF (Structured Data Exchange Format)
  • Silo (a storage format for visualization developed at Lawrence Livermore National Laboratory)
  • Simple Data format (SDF) By George H. Fisher, Space Sciences Lab, UC Berkeley (A platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays)
  • Standard Delay Format (SDF) A standard data structure for timing data
  • XDF (Extensible Data Format) [1]
  • XSIL (Extensible Scientific Interchange Language)

[edit] Astronomical and Space

[edit] Biological

[edit] Chemical

  • CCP4 (X-ray crystallography voxels (electron density))
  • CDX (ChemDraw file format)
  • CDXML (ChemDraw file format)
  • CHM (ChemDraw file format)
  • CIF (Crystallographic Information File, standardised by IUCr)
  • CML (Chemical markup language)
  • CTab (Chemical table file .mol, .sd, .sdf)
  • HITRAN (spectroscopic data with one optical/infrared transition per line in the ASCII file (.hit))
  • JCAMP (Joint Committee on Atomic and Molecular Physical Data, .dx, .jdx)
  • MOL (MDL Molfile)
  • MOP (MOPAC format)
  • MRC (voxels in cryo-electron microscopy)
  • MST ACD/ChemSketch v1 file format
  • Protein Data Bank (PDB)
  • RPT (OpenLynx) Waters OpenLynx reports
  • RXN (Reaction file format)
  • SK2 (ACD/ChemSketch v2 file format)
  • SKC (ISIS/Draw file format)
  • SMILES (Simplified molecular input line entry specification, .smi)
  • SPC (Spectroscopic Data)
  • Structure Data File (SDF)
  • TGF (ISIS/Draw reaction file format)
  • XYZ Chem Wiki

Chemical data may be distinguished in various ways, including Chemical MIME types.

[edit] Earth Sciences

[edit] Ecological

[edit] Environmental

  • HYT (AquiferTest)

[edit] Geographic and Geospatial

See also Geospatial

  • DEM (Digital Elevation Model)
  • DOQ (Digital Orthophotos)
  • e00 (ESRI ArcInfo Interchange File)
  • FGDC (Content Standard for Digital Geospatial Metadata??)
  • GeoTIFF (Geospatial extensions to TIFF)
  • GML (Geography Markup Language)
  • HDF-EOS (Hierarchical Data Format-Earth Observing System)1 (HD2, HD4, HD5)
  • KML (KML (formerly Keyhole Markup Language), Version 2.2)
  • NDF (National Landsat Archive Production System (NLAPS) Data Format)
  • SAIF (Spatial Archive and Interchange Format, Canadian)
  • SDTS (Spatial Data Transfer Standard)
  • Shapefile (ESRI, shp/shx)
  • MrSID (MrSID- Multi-resolution Seamless Image Database)
  • TAB (MapInfo dataset format, must have component)
  • Bathymetric Attributed Grid (.bag)

[edit] Mathematical

[edit] Microscopy

[edit] Neutron and X-ray Scattering

  • canSAS (tools for small-angle scattering)
  • CIF (Crystallographic Information File, standardised by IUCr)
  • NeXus (NeXus is a common data format for neutron, x-ray, and muon science)

[edit] Oceanographic, Atmospheric and Meteorological

  • GRIB (Gridded Binary)
  • BUFR (Binary Universal Format Representation)
  • IOAPI (netCDF augmented with metadata from the I/O API)
  • Meteosat data
  • PP (UK Met Office format for weather model data)

[edit] Physics

See subcategory Physics data

[edit] Scientific Signal data

  • ACQ (AcqKnowledge File Format for Windows)
  • BioSemi (BDF) data format
  • BKR (EEG data format)
  • CFWB (Chart Data File Format)
  • EDF (European data format)
  • FEF (File Exchange Format for Vital signs)
  • General Data Format for Biosignals (GDF)
  • GMS (Gesture And Motion Signal format)
  • IROCK (intelliRock Sensor Data File Format)
  • MFER (Medical waveform Format Encoding Rules)
  • REC (ATI Vision recorder file)
  • SCP-ECG (Standard Communication Protocol for Computer assisted electrocardiography)
  • SIGIF (SIGnal Interchange Format)

[edit] Social Sciences

[edit] Spectra

[edit] Miscellaneous

  • AIML (Artificial Intelligence Markup Language)
  • EMD-DF64 (used for high frequency energy monitoring)
  • IES (IESNA LM-63 Photometric Data File)
  • Jupyter Notebook (.ipynb)

[edit] Links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox