Difference between revisions of "Formats"

From Archiveteam
Jump to navigation Jump to search
(Add Markdown)
(16 intermediate revisions by 11 users not shown)
Line 1: Line 1:
A very good rule of thumb with data formats is to pick those that are ''no more complex than the data being represented'', that are ''recoverable with simple tools'' and ''widely implemented''. In general, if you have written a text document and it's not viewable and editable in a low-level text editor like Notepad (or Emacs, Vim, TextMate, BBEdit, gedit, kate, pico/nano etc.), you should probably take the time to convert it into a plain-text format - keep the rich format also. If you are backing up data in a format that's not widely understood, be sure to also keep backups of the software you use to open it and any registration keys - as you may find that a file made with version 2.x of a piece of software won't open the all new, singing and dancing version 5.x!
{{notice|1=See also [http://fileformats.archiveteam.org/ Let's Solve the File Format Problem] wiki that provides an extensive catalogue of file formats.}}
 
A very good rule of thumb with data formats is to pick those that are ''no more complex than the data being represented'', that are ''recoverable with simple tools'' and ''widely implemented''.
 
In general, if you have written a text document and it's not viewable or editable in a low-level text editor (Notepad, Emacs, and so on), you should probably take the time to convert it into a plain-text format - keep the rich format also.
 
If you are backing up data in a format that's not widely understood, be sure to also keep backups of the software you use to open it and any registration keys. A file made with version 2.x of a piece of software may not open with the all new, singing and dancing version 5.x!
 
'''Tip:''' the Archive Team subdomain http://fileformats.archiveteam.org/ hosts a wiki dedicated to storing information about file formats.


== Text ==
== Text ==
Plain text, HTML and non-bloated XML formats are all good bets (DocBook, TEI etc.).
Plain text, HTML and non-bloated XML formats are all good bets (DocBook, TEI etc.).
=== Markdown ===
[[https://commonmark.org/ Markdown]] is a human-readable, popular markup language. Even if editors somehow disappear, it is still very human-readable, and so you should still be able to get the gist of it even if there are no tools to open it.


=== PDF ===
=== PDF ===
The Portable Document Format standard created by Adobe has reached a point where it should be readable for posterity. It is now open enough that it should have the ability to be read long into the future. You can get Adobe Acrobat Reader [http://get.adobe.com/reader/ here].
The Portable Document Format standard created by Adobe has reached a point where it should be readable for posterity. The format is now open enough that it should be usable for backup for the foreseeable future; [[Wikipedia:PDF/A|PDF/A]] is specifically designed for digital preservation. You can get a PDF reader [http://pdfreaders.org/ here].


=== TeX ===
=== TeX ===
The [http://en.wikipedia.org/wiki/TeX TeX] standard has been around since 1969. TeX documents are text based. It is widely used to prepare multi-thousand page documents for publication, as well as mathematical formula. [http://www.latex-project.org/ LaTeX] is an open implementation of this standard.
The [[wikipedia:TeX|TeX]] standard has been around since 1969. TeX documents are text based. It is widely used to prepare multi-thousand page documents for publication, as well as mathematical formulas. [http://www.latex-project.org/ LaTeX] is a free, open document preparation system based on this standard.


== Images ==
== Images ==
* PNG
* PNG
* Lossless TIFF
* SVG


== Audio ==
== Audio ==
Lossless:
* FLAC
Lossy:
* OGG
* OGG


== Video ==
== Video ==
* Matroska
* OGV
* AVI


== Compression ==
== Compression ==
* [[7z]]
* [[TAR]]
* ZIP
== Website crawls ==
[[WARC]] is required for Wayback Machine integration and is highly recommended. It retains important metadata (such as request/response headers) that would otherwise be lost.


== External links ==
== External links ==
* http://en.wikipedia.org/wiki/Category:Open_formats
* http://en.wikipedia.org/wiki/Category:Open_formats
* http://fileformats.archiveteam.org/ Let's Solve the File Format Problem!
* http://justsolve.archiveteam.org/


{{Navigation pager
| previous = Software
| next = Storage Media
}}
{{Navigation box}}
{{Navigation box}}

Revision as of 14:42, 14 May 2021

Archiveteam1.png See also Let's Solve the File Format Problem wiki that provides an extensive catalogue of file formats.

A very good rule of thumb with data formats is to pick those that are no more complex than the data being represented, that are recoverable with simple tools and widely implemented.

In general, if you have written a text document and it's not viewable or editable in a low-level text editor (Notepad, Emacs, and so on), you should probably take the time to convert it into a plain-text format - keep the rich format also.

If you are backing up data in a format that's not widely understood, be sure to also keep backups of the software you use to open it and any registration keys. A file made with version 2.x of a piece of software may not open with the all new, singing and dancing version 5.x!

Tip: the Archive Team subdomain http://fileformats.archiveteam.org/ hosts a wiki dedicated to storing information about file formats.

Text

Plain text, HTML and non-bloated XML formats are all good bets (DocBook, TEI etc.).

Markdown

[Markdown] is a human-readable, popular markup language. Even if editors somehow disappear, it is still very human-readable, and so you should still be able to get the gist of it even if there are no tools to open it.

PDF

The Portable Document Format standard created by Adobe has reached a point where it should be readable for posterity. The format is now open enough that it should be usable for backup for the foreseeable future; PDF/A is specifically designed for digital preservation. You can get a PDF reader here.

TeX

The TeX standard has been around since 1969. TeX documents are text based. It is widely used to prepare multi-thousand page documents for publication, as well as mathematical formulas. LaTeX is a free, open document preparation system based on this standard.

Images

  • PNG
  • Lossless TIFF
  • SVG

Audio

Lossless:

  • FLAC

Lossy:

  • OGG

Video

  • Matroska
  • OGV
  • AVI

Compression

Website crawls

WARC is required for Wayback Machine integration and is highly recommended. It retains important metadata (such as request/response headers) that would otherwise be lost.

External links

SoftwareFormatsStorage Media