URL
A URL (Uniform Resource Location) is an address of a resource as used on the World Wide Web. Technically speaking, a URL is just one category of such addresses, a subset of URI (Uniform Resource Identifier) and parallel to URN (Uniform Resource Name), but such distinctions aren't always consistently maintained even by technical people, and URL has entered the popular language in a way those other terms have not.
Over time, the precise definitions of the various terms for Web-related addresses have changed and been argued extensively about by technical people, and some more have been added: an IRI (Internationalized Resource Identifier) is like a URI, but extended to allow non-ASCII characters so that languages other than English can be supported. However, the newest HTML 5 standards drafts choose to take a more pragmatic approach of just using "URL" to refer to anything that a browser is expected to resolve as an address, as one of many "willful violations" of earlier tech specs they did there. (The "techie" equivalent of social conservatives may consider this to be "defining deviancy down" and hence an "abomination".)
Use of URLs (and URIs, etc.) is not limited to the Web, as there are a number of other technical usages such as in defining namespaces for file formats (e.g., XML), and in identifying even non-Web-accessible objects for the purpose of expressing taxonomic relations. In less-technical usage, URLs turn up in all sorts of places like TV commercials, billboards, and on the side of vans, but usually with the protocol portion left off because everybody assumes HTTP. These days most browsers don't even show the "http://" part in the address bar, though it's still officially part of the URL.
Contents |
Standard syntax
URLs/URIs/etc. always start with a scheme (protocol). (At least, absolute URLs do; there are also relative URLs that leave off parts at the beginning because they are construed as being relative to the current URL they are accessed from.) The most common is HTTP. The scheme part ends with a colon (:).
After this, the rest of the URL is protocol-dependent; there are a number of different syntaxes used in different types of URLs. A common syntax, expected by the standards to be used in all schemes with hierarchical path structures, follows the scheme part with a double slash (//) which introduces a host or authority portion (usually a domain name), which is then followed by another slash and then the full path being addressed, which uses forward slashes to separate hierarchical levels (which may, but needn't, correspond to subdirectories in a filesystem).
There's a common misconception that URLs always have a double slash after the colon, sometimes causing developers of new schemes to put this in their syntax where the standards don't call for it; it is only supposed to be used if the following element is some sort of "authority" by which a following path is to be interpreted. There are a number of schemes with no such authority, and hence no double slash; for insstance "mailto:".
Official documents
- RFC 1738 (early absolute URL standard)
- RFC 1808 (early relative URL standard)
- RFC 2396 (early URI syntax standard)
- RFC 3986 (later URI syntax standard)
- RFC 4395 (info on registering new URI schemes)
- W3C clarification of URIs, URLs, and URNs (2001)
- HTML 5.1 draft section on URLs (which intentionally disregards the distinctions in the other documents above)
Official sites
- W3C: Naming and Addressing (old)
- W3C: Identifiers (new)
- IANA list of registered URI schemes
- IANA list of URN namespaces