What is URL?
URL (Uniform Resource Locator) - colloquially known as the web address - is a unique sequence of characters that provides means for locating digital information resources (digitally preserved intangible creations of the human intellect) over a network, including, but not limited to, the Internet, which then can be retrieved using various transfer protocols such as HTTP (Hypertext Transfer Protocol) for hypertext documents, FTP (File Transfer Protocol) for files, and mailto for emails.
URL as URI (Uniform Resource Identifier)
URL is a kind of URI (Uniform Resource Identifier).
URI (Uniform Resource Identifier) is a unique sequence of characters that provides means for identifying physical resources, such as physical objects, places, and people, and logical resources (intangible creations of the human intellect), such as ideas, concepts, written documents, books, songs, movies, games, and digital information resources.
The term URL was defined by the inventor of the World Wide Web, Tim Berners-Lee, in 1994.
The conceptualization of URL is interlinked with the development of the Hypertext Transfer Protocol (HTTP) initiated by Tim Berners-Lee at CERN in 1989. See What is HTTP? for further reading.
As noted above, URL is a kind of URI.
The generic URI syntax is comprised of five components:
query (optional), and
The authority component is comprised of three subcomponents:
host (required), and,
The userinfo subcomponent can be further divided into the following subcomponents:
username (required), and
It is worth remembering that the only required URI components are scheme and path.
URL observes the generic URI syntax however the
scheme component is referred to as the protocol. The protocol component can include such values as
Therefore, an URL can be comprised of five components:
authority (optional) consisting of optional userinfo, required host and optional port,
query (optional), and
Of those five components, only
path are required.
An URL referring to a local file is an example of the URL with only protocol and path specified.
URL protocol - the counterpart of URI scheme - is the first of the two required URL components (the second being the path) which denotes a set of rules and procedures governing the manner in which the URL resource should be accessed.
A common protocol value can be
http (together with its encrypted counterpart
URL authority is an optional URL component which provides directions to a server or servers from which the underlying URL resource is to be served, and optionally user authentication while accessing those servers and port through which those servers should be accessed.
The URL authority component is comprised of three subcomponents:
host (required), and
URL Authority Userinfo
URL authority userinfo is an optional subcomponent in optional URL authority component which provides required username and optional password for the purpose of authentication while accessing the URL host.
The data in the password subcomponent is provided as plain text and therefore its usage is advised against or even considered deprecated.
URL Authority Host
URL authority host is a required subcomponent in the optional URL authority component which provides directions to a server or servers from which the underlying URL resource is to be served.
host component can be:
a registered domain name (e.g.
example.net), which can be prefixed with an optional subdomain (e.g.
an IP address (e.g.
In the registered domain name the last subcomponent (e.g.
co.uk) is referred to as the domain suffix.
A registered domain name is mapped to one or many IP addresses by Domain Name System (DNS).
On the other hand, many different registered domain names can map to one IP address when the so-called virtual hosting is involved.
URL Authority Port
URL authority port is an optional subcomponent in the optional URL authority component which identifies the client process which accesses the URL host.
Some port numbers are deemed reserved and are supposed only to be used by processes using specific protocols.
Commonly used port for web servers is
8080, however when no port is specified HTTP uses implicitly port
80, and HTTPS uses implicitly
443. Those port numbers are reserved for those purposes.
URL path is the second of the two required URL components (the first one being the protocol) which denotes the logical location to the contemplated URL resource (whereas the
protocol denotes the rules and procedures regarding the manner of resource access).
Path can consist of one segment (e.g.
/movies) or many segments (e.g.
URL query is an optional URL component which denotes parameters that are to be used during the URL resource access.
In an URL the query component is preceded with the question mark (
The query syntax is not clearly defined but usually is a sequence of key-value pairs separated by the ampersand (
URL fragment is an optional URL component which denotes the logical location to the URL secondary resource such as an element in HTML document specified by the ID attribute.
In an URL the fragment subcomponent is preceded by the hash symbol (
URLs in HTTP Requests
URLs are used in HTTP requests to indicate resources being accessed.
Resources that can be accessed using HTTP and URLs are primarily HTML documents.
URLs in Hyperlinks
URLs can be used in hyperlinks.
Hyperlink (aka link) is a user-followable reference to a digital resource.
For example, a hyperlink in HTML is built using the
anchor element, and its
href (stands for hypertext reference) attribute's value can be specified as URL.
<a href="https://soundof.it/http/tutorial/what-is-http" > What is HTTP? </a>
href attribute's value can also be specified as a relative URL.
Relative URL is an URL in which protocol and authority components are not indicated explicitly but implicitly through substitution with the relevant values from the currently accessed resource's URL.
<a href="/http/tutorial/what-is-http" > What is HTTP? </a>
URL Converting & Encoding
A given URL can only be for data transmission over the Internet when it consists of URL safe ASCII characters.
A non-ASCII character to be used within an URL needs to be converted to the so-called Punycode consisting only of ASCII safe characters.
An example of a non-ASCII character is
♥ which - to be used within an URL - needs to be converted into
An unsafe ASCII character to be used within an URL needs to be encoded to a set of safe ASCII characters consisting of
% prefix followed by a hexadecimal number.
An example of an unsafe ASCII character is the
space character which needs to be encoded into
IRI (Internationalized Resource Identifier)
IRI (Internationalized Resource Identifier) is an URL which for internationalization purposes can consist of Unicode characters (as opposed to standard URL safe ASCII characters).
Most modern browsers support IRIs.
For the purpose of Internet transmission, IRI non-ASCII characters are converted into the so-called Punycode which consists only of ASCII characters.
An example of a Unicode non-ASCII character is
人 which needs to be converted into
A domain name in IRI with internationalized characters is known as an Internationalized Domain Name (IDN).