URL

From Just Solve the File Format Problem
(Difference between revisions)
Jump to: navigation, search
 
(24 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
{{FormatInfo
 
{{FormatInfo
 
|subcat=Web
 
|subcat=Web
 +
|released=1990
 +
|wikidata={{wikidata|Q42253}}, {{wikidata|Q61694}}, {{wikidata|Q424583}}
 
}}
 
}}
  
A '''URL''' (Uniform Resource Location) is an address of a resource as used on the World Wide Web. Technically speaking, a URL is just one category of such addresses, a subset of URI (Uniform Resource Identifier) and parallel to URN (Uniform Resource Name), but such distinctions aren't always consistently maintained even by technical people, and URL has entered the popular language in a way those other terms have not.
+
A '''URL''' (Uniform Resource Locator) is an address of a resource as used on the World Wide Web, and is one of Tim Berners Lee's original three pillars of the Web along with [[HTTP]] and [[HTML]]. Technically speaking, a URL is just one category of such addresses, a subset of URI (Uniform Resource Identifier) and parallel to URN (Uniform Resource Name), but such distinctions aren't always consistently maintained even by technical people, and URL has entered the popular language in a way those other terms have not.
  
 
Over time, the precise definitions of the various terms for Web-related addresses have changed and been argued extensively about by technical people, and some more have been added: an IRI (Internationalized Resource Identifier) is like a URI, but extended to allow non-[[ASCII]] characters so that languages other than English can be supported. However, the newest HTML 5 standards drafts choose to take a more pragmatic approach of just using "URL" to refer to anything that a browser is expected to resolve as an address, as one of many "willful violations" of earlier tech specs they did there. (The "techie" equivalent of social conservatives may consider this to be "defining deviancy down" and hence an "abomination".)
 
Over time, the precise definitions of the various terms for Web-related addresses have changed and been argued extensively about by technical people, and some more have been added: an IRI (Internationalized Resource Identifier) is like a URI, but extended to allow non-[[ASCII]] characters so that languages other than English can be supported. However, the newest HTML 5 standards drafts choose to take a more pragmatic approach of just using "URL" to refer to anything that a browser is expected to resolve as an address, as one of many "willful violations" of earlier tech specs they did there. (The "techie" equivalent of social conservatives may consider this to be "defining deviancy down" and hence an "abomination".)
  
Use of URLs (and URIs, etc.) is not limited to the Web, as there are a number of other technical usages such as in defining namespaces for file formats (e.g., [[XML]]), and in identifying even non-Web-accessible objects for the purpose of expressing taxonomic relations. In less-technical usage, URLs turn up in all sorts of places like TV commercials, billboards, and on the side of vans, but usually with the protocol portion left off because everybody assumes [[HTTP]]. These days most browsers don't even show the "http://" part in the address bar, though it's still officially part of the URL.
+
Use of URLs (and URIs, etc.) is not limited to the Web, as there are a number of other technical usages such as in defining namespaces for file formats (e.g., [[XML]]), and in identifying even non-Web-accessible objects for the purpose of expressing taxonomic relations. In less-technical usage, URLs turn up in all sorts of places like TV commercials, billboards, and on the side of vans, but often with the protocol portion left off when [[HTTP]] is used. These days most browsers don't even show the "http://" part in the address bar, though it's still officially part of the URL. (On the other hand, there's a growing movement to shift sites to the encrypted "https" protocol for privacy.)
  
 
== Types of identifiers ==
 
== Types of identifiers ==
  
* '''URI''': The official "parent term" for URLs, URNs, and other such identifiers, but limited to [[ASCII]] characters, with anything else needing to be specially encoded. Even within the ASCII range, some characters such as the space are prohibited, reserved, or designated to be used only for specific syntactic purposes, with encoding necessary for all other uses.
+
* '''URI''': The official "parent term" for URLs, URNs, and other such identifiers, but limited to [[ASCII]] characters, with anything else needing to be specially encoded. Even within the ASCII range, some characters such as the space are prohibited, reserved, or designated to be used only for specific syntactic purposes, with percent encoding necessary for all other uses.
  
* '''IRI''': The internationalized version of URIs, with more liberal rules about what characters in the entire [[Unicode]] range may be included. This allows text in non-English languages to be included without messy encoding, though various transfer protocols may still require the entire string to be encoded on transmission to produce an ASCII-based URI.
+
* '''IRI''': The internationalized version of URIs, with more liberal rules about what characters in the entire [[Unicode]] range may be included. This allows text in non-English languages to be included without messy encoding, though various transfer protocols may still require the entire string to be encoded on transmission to produce an ASCII-based URI. This is really just a different representation of a URI; any IRI may be represented as a URI, with [[percent-encoding]] UTF-8 sequences if it contains non-ASCII characters.
  
 
* '''URL''': Technically only the subset of URIs that are "locators", able to be used to retrieve resources because they designate a specific address for them, but in practice the distinction is very fuzzy and usually ignored. Some newer standards such as HTML 5.0 simply follow common non-techie usage and use URL to refer to the whole universe of Web-style addresses (encompassing URIs and IRIs, and anything else a browser can accept as an address even if it fails to comply with any of the standards).
 
* '''URL''': Technically only the subset of URIs that are "locators", able to be used to retrieve resources because they designate a specific address for them, but in practice the distinction is very fuzzy and usually ignored. Some newer standards such as HTML 5.0 simply follow common non-techie usage and use URL to refer to the whole universe of Web-style addresses (encompassing URIs and IRIs, and anything else a browser can accept as an address even if it fails to comply with any of the standards).
  
* '''URN''': Uniform Resource Name. Another type of URI which is supposed to provide a stable permanent identifier for a resource which does not include a specific (and changeable) address for it. To resolve a URN, one needs a resolver such as a server or website that stores a table of current locations of items with URNs. Currently the standards call for all URNs to begin with the 'urn:' scheme identifier, and the next item after this is a URN namespace, followed by another colon and the namespace-specific information. Some common naming schemes have been adopted as URNs, such as ISBNs (International Standard Book Number), which have the format "urn:isbn:1-234567-890". Unfortunately, browsers haven't been quick to implement URN resolvers as standard features, though add-ons can be installed to do it.
+
* '''URN''': Uniform Resource Name. Another type of URI which is supposed to provide a stable permanent identifier for a resource which does not include a specific (and changeable) address for it. To resolve a URN, one needs a resolver such as a server or website that stores a table of current locations of items with URNs. Currently the standards call for all URNs to begin with the 'urn:' scheme identifier, and the next item after this is a URN namespace, followed by another colon and the namespace-specific information. Some common naming schemes have been adopted as URNs, such as [[ISBN]]s (International Standard Book Number), which have the format "urn:isbn:1-234567-890". Unfortunately, browsers haven't been quick to implement URN resolvers as standard features, though add-ons can be installed to do it. URNs can also be used to refer to a [[UUID]]. URNs are also used to refer to hashes in [[magnet URI]]s.
  
 
== Standard syntax ==
 
== Standard syntax ==
  
URLs/URIs/etc. always start with a scheme (protocol). (At least, ''absolute'' URLs do; there are also ''relative'' URLs that leave off parts at the beginning because they are construed as being relative to the current URL they are accessed from.) The most common is [[HTTP]]. The scheme part ends with a colon (:).
+
URLs/URIs/etc. always start with a scheme (protocol). (At least, ''absolute'' URLs do; there are also ''relative'' URLs that leave off parts at the beginning because they are construed as being relative to the current URL they are accessed from.) The most common is traditionally [[HTTP]], though these days the encrypted variant HTTPS is increasingly common; there are many other schemes too, although they are less common. The scheme part ends with a colon (:).
  
After this, the rest of the URL is protocol-dependent; there are a number of different syntaxes used in different types of URLs. A common syntax, expected by the standards to be used in all schemes with hierarchical path structures, follows the scheme part with a double slash (//) which introduces a host or authority portion (usually a [[domain name]]), which is then followed by another slash and then the full path being addressed, which uses forward slashes to separate hierarchical levels (which may, but needn't, correspond to subdirectories in a [[filesystem]]).
+
After this, the rest of the URL is scheme-dependent; there are a number of different syntaxes used in different types of URLs. A common syntax, expected by the standards to be used in all schemes with hierarchical path structures, follows the scheme part with a double slash (//) which introduces a host or authority portion (usually a [[domain name]]), which is then followed by another slash and then the full path being addressed, which uses forward slashes to separate hierarchical levels (which may, but needn't, correspond to subdirectories in a [[filesystem]]).
  
There's a common misconception that URLs always have a double slash after the colon, sometimes causing developers of new schemes to put this in their syntax where the standards don't call for it; it is only supposed to be used if the following element is some sort of "authority" by which a following path is to be interpreted. There are a number of schemes with no such authority, and hence no double slash; for insstance "mailto:".
+
There's a common misconception that URLs always have a double slash after the colon, sometimes causing developers of new schemes to put this in their syntax where the standards don't call for it; it is only supposed to be used if the following element is some sort of "authority" (most commonly the address to connect to) by which a following path is to be interpreted. There are a number of schemes with no such authority, and hence no double slash; for insstance "mailto:".
 +
 
 +
== data: URLs ==
 +
 
 +
One scheme, '''data:''', is actually a file format in its own right, since it encodes the entire contents of a file within the URL instead of referencing an external resource as other schemes do.
 +
 
 +
* [[Wikipedia:Data URI scheme|Wikipedia article on data: URIs]]
 +
* RFC 2397
 +
* [http://dataurl.net/#about Data URL maker]
 +
 
 +
== Example ==
 +
 
 +
http://archive.org/upload/?description=I%20finally%20got%20it.%20However%2C%20an%20update%20made%20the%20Home%20Screen%20look%20like%20Jadoo%205%20and%20not%20like%20the%20new%20Home%20Screen%20they%20put%20in.&subject=Jadoo%2CJadooTV%2CJadoo7%2CJadoo%207%2CAndroid%2CAPK%2CAndroid%20Pie%2CAndroid%209%2CAndroid%209.0%2CAndroid%20TV%2CFarsi%2CHindi&creator=JadooTV&date=2021&collection=open_source_software
 +
== See also ==
 +
 
 +
* [[Magnet URI]]
 +
* [[URL encoding]]
  
 
== Official documents ==
 
== Official documents ==
Line 37: Line 55:
 
* [http://www.w3.org/TR/2001/NOTE-uri-clarification-20010921/ W3C clarification of URIs, URLs, and URNs (2001)]
 
* [http://www.w3.org/TR/2001/NOTE-uri-clarification-20010921/ W3C clarification of URIs, URLs, and URNs (2001)]
 
* [http://www.w3.org/TR/2012/WD-html51-20121217/infrastructure.html#urls HTML 5.1 draft section on URLs] (which intentionally disregards the distinctions in the other documents above)
 
* [http://www.w3.org/TR/2012/WD-html51-20121217/infrastructure.html#urls HTML 5.1 draft section on URLs] (which intentionally disregards the distinctions in the other documents above)
 +
 +
== Proposed documents ==
 +
* [http://tools.ietf.org/html/draft-kerwin-file-scheme-09 Internet draft for File scheme]
  
 
== Official sites ==
 
== Official sites ==
Line 47: Line 68:
 
== Other links ==
 
== Other links ==
  
* [http://webtips.dan.info/url.html Dan's Web Tips: URLs]
+
* [https://webtips.dan.info/url.html Dan's Web Tips: URLs]
 
* [http://blog.welldesignedurls.org/ Well Designed URLs Blog]
 
* [http://blog.welldesignedurls.org/ Well Designed URLs Blog]
 +
* [https://www.w3.org/Provider/Style/URI.html Cool URIs don't change (Tim Berners-Lee)]
 +
* [http://blogs.msdn.com/b/ie/archive/2006/12/06/file-uris-in-windows.aspx File URIs in Windows]
 +
* [https://offset.skew.org/wiki/URI/File_scheme File URI scheme update project wiki]
 +
* [https://github.com/AGLDWG/TR/wiki/URI-Guidelines-for-publishing-linked-datasets-on-data.gov.au-v0.1 URI Guidelines for publishing Linked Datasets on data.gov.au v0.1]
 +
* [https://www.w3.org/community/blog/2014/09/11/proposed-group-uri-specification-community-group/ Proposed Group: URI Specification Community Group]
 +
* [http://ben.balter.com/2014/10/07/expose-process-through-urls/ If you liked it then you should have put a URL on it]
 +
* [http://blog.webrecorder.io/2015/02/beyond-robust-links-case-for-robust.html Beyond Robust Links: The case for robust urls and an Archival Url standard]
 +
* [https://corner.squareup.com/2015/05/okhttp-2-4.html OkHttp’s New URL Class]
 +
* [https://eager.io/blog/the-history-of-the-url-domain-and-protocol/ History of the URL: Domain, Protocol, and Port]
  
 
[[Category:Naming and numbering systems]]
 
[[Category:Naming and numbering systems]]

Latest revision as of 19:49, 23 November 2021

File Format
Name URL
Ontology
Wikidata ID Q42253, Q61694, Q424583
Released 1990

A URL (Uniform Resource Locator) is an address of a resource as used on the World Wide Web, and is one of Tim Berners Lee's original three pillars of the Web along with HTTP and HTML. Technically speaking, a URL is just one category of such addresses, a subset of URI (Uniform Resource Identifier) and parallel to URN (Uniform Resource Name), but such distinctions aren't always consistently maintained even by technical people, and URL has entered the popular language in a way those other terms have not.

Over time, the precise definitions of the various terms for Web-related addresses have changed and been argued extensively about by technical people, and some more have been added: an IRI (Internationalized Resource Identifier) is like a URI, but extended to allow non-ASCII characters so that languages other than English can be supported. However, the newest HTML 5 standards drafts choose to take a more pragmatic approach of just using "URL" to refer to anything that a browser is expected to resolve as an address, as one of many "willful violations" of earlier tech specs they did there. (The "techie" equivalent of social conservatives may consider this to be "defining deviancy down" and hence an "abomination".)

Use of URLs (and URIs, etc.) is not limited to the Web, as there are a number of other technical usages such as in defining namespaces for file formats (e.g., XML), and in identifying even non-Web-accessible objects for the purpose of expressing taxonomic relations. In less-technical usage, URLs turn up in all sorts of places like TV commercials, billboards, and on the side of vans, but often with the protocol portion left off when HTTP is used. These days most browsers don't even show the "http://" part in the address bar, though it's still officially part of the URL. (On the other hand, there's a growing movement to shift sites to the encrypted "https" protocol for privacy.)

Contents

[edit] Types of identifiers

  • URI: The official "parent term" for URLs, URNs, and other such identifiers, but limited to ASCII characters, with anything else needing to be specially encoded. Even within the ASCII range, some characters such as the space are prohibited, reserved, or designated to be used only for specific syntactic purposes, with percent encoding necessary for all other uses.
  • IRI: The internationalized version of URIs, with more liberal rules about what characters in the entire Unicode range may be included. This allows text in non-English languages to be included without messy encoding, though various transfer protocols may still require the entire string to be encoded on transmission to produce an ASCII-based URI. This is really just a different representation of a URI; any IRI may be represented as a URI, with percent-encoding UTF-8 sequences if it contains non-ASCII characters.
  • URL: Technically only the subset of URIs that are "locators", able to be used to retrieve resources because they designate a specific address for them, but in practice the distinction is very fuzzy and usually ignored. Some newer standards such as HTML 5.0 simply follow common non-techie usage and use URL to refer to the whole universe of Web-style addresses (encompassing URIs and IRIs, and anything else a browser can accept as an address even if it fails to comply with any of the standards).
  • URN: Uniform Resource Name. Another type of URI which is supposed to provide a stable permanent identifier for a resource which does not include a specific (and changeable) address for it. To resolve a URN, one needs a resolver such as a server or website that stores a table of current locations of items with URNs. Currently the standards call for all URNs to begin with the 'urn:' scheme identifier, and the next item after this is a URN namespace, followed by another colon and the namespace-specific information. Some common naming schemes have been adopted as URNs, such as ISBNs (International Standard Book Number), which have the format "urn:isbn:1-234567-890". Unfortunately, browsers haven't been quick to implement URN resolvers as standard features, though add-ons can be installed to do it. URNs can also be used to refer to a UUID. URNs are also used to refer to hashes in magnet URIs.

[edit] Standard syntax

URLs/URIs/etc. always start with a scheme (protocol). (At least, absolute URLs do; there are also relative URLs that leave off parts at the beginning because they are construed as being relative to the current URL they are accessed from.) The most common is traditionally HTTP, though these days the encrypted variant HTTPS is increasingly common; there are many other schemes too, although they are less common. The scheme part ends with a colon (:).

After this, the rest of the URL is scheme-dependent; there are a number of different syntaxes used in different types of URLs. A common syntax, expected by the standards to be used in all schemes with hierarchical path structures, follows the scheme part with a double slash (//) which introduces a host or authority portion (usually a domain name), which is then followed by another slash and then the full path being addressed, which uses forward slashes to separate hierarchical levels (which may, but needn't, correspond to subdirectories in a filesystem).

There's a common misconception that URLs always have a double slash after the colon, sometimes causing developers of new schemes to put this in their syntax where the standards don't call for it; it is only supposed to be used if the following element is some sort of "authority" (most commonly the address to connect to) by which a following path is to be interpreted. There are a number of schemes with no such authority, and hence no double slash; for insstance "mailto:".

[edit] data: URLs

One scheme, data:, is actually a file format in its own right, since it encodes the entire contents of a file within the URL instead of referencing an external resource as other schemes do.

[edit] Example

http://archive.org/upload/?description=I%20finally%20got%20it.%20However%2C%20an%20update%20made%20the%20Home%20Screen%20look%20like%20Jadoo%205%20and%20not%20like%20the%20new%20Home%20Screen%20they%20put%20in.&subject=Jadoo%2CJadooTV%2CJadoo7%2CJadoo%207%2CAndroid%2CAPK%2CAndroid%20Pie%2CAndroid%209%2CAndroid%209.0%2CAndroid%20TV%2CFarsi%2CHindi&creator=JadooTV&date=2021&collection=open_source_software

[edit] See also

[edit] Official documents

[edit] Proposed documents

[edit] Official sites

[edit] Other links

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox