Scientific Data formats
From Just Solve the File Format Problem
(Difference between revisions)
(Added an "Earth Sciences" section consisting mostly of redlinks (cannibalizing from the signal data section in the process) - I intend to fill most of these out when I have more time) |
|||
Line 305: | Line 305: | ||
* [[AIML]] (Artificial Intelligence Markup Language) | * [[AIML]] (Artificial Intelligence Markup Language) | ||
+ | * [[IES]] (IESNA LM-63 Photometric Data File) | ||
* [[Jupyter Notebook]] (.ipynb) | * [[Jupyter Notebook]] (.ipynb) | ||
== Links == | == Links == | ||
* [http://cameronneylon.net/blog/improving-on-access-to-research/ Improving on “Access to Research”] | * [http://cameronneylon.net/blog/improving-on-access-to-research/ Improving on “Access to Research”] |
Revision as of 15:23, 18 June 2019
See also Health and Medicine for medical/biomedical data formats.
Contents |
General
- Common Data Format (CDF)
- EAS3 (binary file format for structured data)
- HDF (Hierarchical Data Format, originally from NCSA, now maintained by The HDF Group)
- NRRD (Nearly Raw Raster Data -- a simple format for n-dimensional raster data)
- NetCDF (Network Common Data Format)
- ROOT (CERN data-analysis package and related formats, used in their Open Data initiative)
- SDXF (Structured Data Exchange Format)
- Silo (a storage format for visualization developed at Lawrence Livermore National Laboratory)
- Simple Data format (SDF) By George H. Fisher, Space Sciences Lab, UC Berkeley (A platform-independent, precision-preserving binary data I/O format capable of handling large, multi-dimensional arrays)
- Standard Delay Format (SDF) A standard data structure for timing data
- XDF (eXtensible Data Format)
- XSIL (Extensible Scientific Interchange Language)
Astronomical and Space
- Advanced Scientific Data Format
- Flexible Image Transport System (FITS)
- PSRFITS (Pulsar data storage standard)
- ICER
- NASA Raster Metafile
- ODL (NASA Object Description Language)
- PDS (Planetary Data System)
- PDS4
- VOTable (IVOA standard table format)
- SBIG CCDOPS image
- Standard Archive Format (used for USAF missile data)
- SDF (Starlink Data Format) and NDF (Starlink's Extensible N-Dimensional Data Format).
- VICAR
- WinMiPS
Biological
- 23andMe
- AB1 (Chromatogram files used by DNA sequencing instruments from Applied Biosystems)
- ABCD (Access to Biological Collection Data)
- ABCDDNA (Access to Biological Collection Data DNA extension)
- ABCDEFG (Access to Biological Collection Data Extension For Geosciences)
- ACE (Sequence assembly format)
- Affymetrix Raw Intensity Format
- ARF (Axon Raw Format)
- ARLEQUIN Project Format
- Axt Alignment Format
- BAM (Binary compressed SAM format)
- BED (Browser extensible display format describing genes and other features of DNA sequences)
- BEDgraph
- Big Browser Extensible Data Format
- Big Wiggle Format
- Binary Alignement Map Format
- Binary Probe Map Format
- Binary sequence information Format
- Biological Pathway eXchange
- BLAT alignment Format
- BRIX generated O Format
- CAF (Common Assembly Format for sequence assembly)
- CellML
- CHADO XML interchange Format
- Chain Format for pairwise alignment
- CHARMM Card File Format
- CLUSTAL-W Alignment Format
- CLUSTAL-W Dendrogram Guide File Format
- Clustered Data Table Format
- Complete Genomics
- DELTA (DEscription Language for TAxonomy)
- DAS (Distributed Sequence Annotation System)
- DBN (Dot Bracket Notation (DBN) - Vienna Format)
- EMBL (Flatfile format used by the EMBL for nucleotide and peptide sequences)
- EML (Environmental Markup Language) not to be confused with EML (Ecological Metadata Language)
- ENCODE (Peak information Format)
- FASTA and FASTQ (File format for sequence data, FASTQ with quality)
- FuGEFlow
- FuGE-ML (Functional Genomics Experiment Markup Language)
- Gating-ML
- GCDML (Genomic Contextual Data Markup Language)
- GelML Gel electrophoresis Markup Language
- GenBank (Flatfile format used by NCBI for nucleotide and peptide sequences)
- Gene Feature File (Versions 1 and 3)
- GFF (General feature format for describing genes and other features of DNA, RNA and protein sequences)
- Gene Prediction File Format
- GenePattern GeneSet Table Format
- Genome Annotation File (version 1 and 2)
- GTF (Gene transfer format holds information about gene structure)
- HMMER
- ICB (ICM binary file Format)
- imzML (imaging mz Markup Language)
- ISA-Tab (Investigation Study Assay Tabular)
- ISND sequence record XML
- KGML (KEGG Mark-up Language)
- MAGE-Tab (MicroArray Gene Expression Tabular)
- MCL (Microbiological Common Language)
- MIARE-TAB (Minimum Information About a RNAi Experiment Tabular)
- microarray track data Browser Extensible Data Format
- MINiML (MIAME Notation in Markup Language)
- mini Protein Data Bank Format
- MIQAS-TAB (Minimal Information for QTLs and Association Studies Tabular)
- MITAB
- mmCIF (macromolecular Crystallographic Information File)
- Multiple Alignment Forma
- mzData (deprecated)
- mzIdentML
- mzML
- mzQuantML
- mzXML (deprecated)
- NCD (Natural Collections Descriptions)
- NDTF (Neurophysiology Data Translation Format)
- net alignment annotation Format
- NeuroML (Neuroscience eXtensible Markup Language)
- New Hampshire eXtended Format
- Newick tree Format
- NEXUS (Encodes mixed information about genetic sequence data in a block structured format)
- Nimblegen Design File Format
- Nimblegen Gene Data Format
- NMR-STAR (NMR Self-defining Text Archive and Retrieval format)
- nucleotide inFormation binary Format
- ODM (Operational Data Model)
- Open Biomedical Ontology Flat File Format
- Personal Genome SNP Format
- PHD (Output from the basecalling software Phred)
- phyloXML (XML for evolutionary biology and comparative genomics)
- Pre-Clustering File Format
- Protein Data Bank (PDB; Structures of biomolecules deposited in Protein Data Bank)
- Protein InFormation Resource Format
- PRM (Protocol Representation Model (Medical Research))
- PSI-MI XML
- PSI-PAR
- RDML (Real-time PCR Data Markup Language)
- SAM (Sequence Alignment/Map format)
- SCF (Staden chromatogram files used to store data from DNA sequencing)
- SBML (Systems Biology Markup Language used to store biochemical network computational models)
- SDD (Structured Descriptive Data)
- SED-ML (Simulation Experiment Description Markup Language)
- Sequence Alignment Map Format
- SOFT (Simple Omnibus Format in Text)
- spML (Separation Markup Language)
- SRA-XML (Short Read Archive eXtensible Markup Language)
- Standard Flowgram Format
- Stockholm Multiple Alignment Format (Representing multiple sequence alignments)
- SBML (System Biology Markup Language)
- SBGN (Systems Biology Graphical Notation)
- SBRML (Systems Biology Results Markup Language)
- Swiss-Prot (Flatfile format used for protein sequences from the Swiss-Prot database)
- TAIR annotation data Format
- TAPIR (TDWG Access Protocol for Information Retrieval)
- TCS (Taxonomic Concept transfer Schema)
- TraML (Transition Markup Language)
- UniProtKB XML Format
- VCF (Variant Call Format)
- Wiggle Format
Chemical
- CCP4 (X-ray crystallography voxels (electron density))
- CDX (ChemDraw file format)
- CDXML (ChemDraw file format)
- CHM (ChemDraw file format)
- CIF (Crystallographic Information File, standardised by IUCr)
- CML (Chemical markup language)
- CTab (Chemical table file .mol, .sd, .sdf)
- HITRAN (spectroscopic data with one optical/infrared transition per line in the ASCII file (.hit))
- JCAMP (Joint Committee on Atomic and Molecular Physical Data, .dx, .jdx)
- MOL (MDL Molfile)
- MOP (MOPAC format)
- MRC (voxels in cryo-electron microscopy)
- MST ACD/ChemSketch v1 file format
- Protein Data Bank (PDB)
- RPT ACD/ChemSketch v1 file format
- RXN (Reaction file format)
- SK2 (ACD/ChemSketch v2 file format)
- SKC (ISIS/Draw file format)
- SMILES (Simplified molecular input line entry specification, .smi)
- SPC (Spectroscopic Data)
- Structure Data File (SDF)
- TGF (ISIS/Draw reaction file format)
Chemical data may be distinguished in various ways, including Chemical MIME types.
Earth Sciences
- Adaptable Seismic Data Format
- Network-Day Tape
- QuakeML
- SEED
- SEG Y (Reflection seismology data format)
Ecological
- Darwin Core (Standard for sharing information about biological diversity)
- Electronic Data Deliverable (EDD; EPA Superfund)
- EML (Ecological Metadata Language), not to be confused with EML (Environmental Markup Language)
Geographic and Geospatial
See also Geospatial
- DEM (Digital Elevation Model)
- DOQ (Digital Orthophotos)
- e00 (ESRI ArcInfo Interchange File)
- FGDC (Content Standard for Digital Geospatial Metadata??)
- GeoTIFF (Geospatial extensions to TIFF)
- GML (Geography Markup Language)
- HDFEOS, HD2, HD4 (Hierarchical Data Format-Earth Observing System)
- KML (KML (formerly Keyhole Markup Language), Version 2.2)
- NDF (National Landsat Archive Production System (NLAPS) Data Format)
- SAIF (Spatial Archive and Interchange Format, Canadian)
- SDTS (Spatial Data Transfer Standard)
- Shapefile (ESRI, shp/shx)
- MrSID (MrSID- Multi-resolution Seamless Image Database)
- TAB (MapInfo dataset format, must have component)
Mathematical
- AsciiMath
- DOT (graph description language)
- GEXF (Graph Exchange XML Format)
- graph6, sparse6 (ASCII encoding of Adjacency matrices (.g6, .s6))
- graphML (Graph Markup Language)
- MacWavelets
- Mathematica
- Computable Document Format (.cdf)
- Mathematica notebook (.nb, .nbp)
- Mathematica package file (M)
- Wolfram Language
- MathML
- MATLAB
- MAT (MATLAB data format)
- Matlab figure
- MATLAB script file (m)
- OPJ (Origin data format)
- Statistica
- WP2 WinPlot
Microscopy
- Amber ARR Bitmap Image
- Aperio SVS
- Bio
- BioRad confocal image
- DeltaVision
- dm2 (Gatan Digital Micrograph 2)
- dm3 (Gatan Digital Micrograph 3) (fmt/1131)
- GATAN
- Image Cytometry Standard (ICS)
- KONTRON
- LIFF (Openlab Layered Image File Format)
- LSM (Zeiss Light Speed Microscope)
- MetaMorph Stack (.stk)
- MRC (Medical Research Council)
- OME-TIFF (Open Microscopy Imaging format)
- OME-XML (Open Microscopy Imaging format)
- SMV
- VGS-8
- Zeiss BIVAS
Oceanographic, Atmospheric and Meteorological
- GRIB (Gridded Binary)
- BUFR (Binary Universal Format Representation)
- IOAPI (netCDF augmented with metadata from the I/O API)
- Meteosat data
- PP (UK Met Office format for weather model data)
Physics
See subcategory Physics data
Scientific Signal data
- ACQ (AcqKnowledge File Format for Windows)
- BioSemi (BDF) data format
- BKR (EEG data format)
- CFWB (Chart Data File Format)
- EDF (European data format)
- FEF (File Exchange Format for Vital signs)
- General Data Format for Biosignals (GDF)
- GMS (Gesture And Motion Signal format)
- IROCK (intelliRock Sensor Data File Format)
- MFER (Medical waveform Format Encoding Rules)
- REC (ATI Vision recorder file)
- SCP-ECG (Standard Communication Protocol for Computer assisted electrocardiography)
- SIGIF (SIGnal Interchange Format)
Social Sciences
- Atlas.ti (Computer-assisted qualitative data analysis package)
- DDI (Data Documentation Initiative)
- DO ("DO file" command script for the Stata Statistical package)
- DTA (Binary data file for the Stata Statistical package)
- M2k (MAXQDA)
- NVivo (Computer-assisted qualitative data analysis package)
- R (Statistical package)
- SAS (Statistical package)
- SAV (Binary "SPSS data format" for the SPSS Statistical package)
- SPO (Output file for the SPSS Statistical package - version 14)
- SPS ("Syntax file" (plain text command script) for the SPSS Statistical package)
- SPV (Output file for the SPSS Statistical package - version 17 and later)
- Transana (Computer-assisted qualitative data analysis package)
Miscellaneous
- AIML (Artificial Intelligence Markup Language)
- IES (IESNA LM-63 Photometric Data File)
- Jupyter Notebook (.ipynb)