Codepath

PHP Encoding for URLs

Creating links

A link in PHP is just a simple HTML link.

// index.php
<a href="contact.php">Contact Us</a>

PHP can enhance HTML links by making them dynamic.

// index.php
<?php $this_page = 'home'; ?>
<a href="contact.php?from=<?php echo $this_page; ?>">Contact Us</a>

Be sure to use echo when using PHP to output to HTML. It is simple, but a common mistake.

This code is using PHP to add query parameters after the "from" parameter.

URL query strings are formatted as a series of parameter/value pairs with "=" between them, "&" to join them, and a "?" at the start.

http://example.com/some/path?param=value&param=value&param=value

Reading query parameters

PHP can read the query parameter values being sent to the linked page. PHP automatically assigns all of these query parameters to the $_GET superglobal. It is called "$_GET" because links are GET requests. (Forms are POST requests.)

$_GET is an associative array and its values can be accessed like any associative array.

<?php
  $from = $_GET['from'];
?>

Always confirm that a $_GET value is set before working with it. In some configurations, PHP will show an unpleasant warning message if you try to access an associative array key which is not set.

<?php
if(isset($_GET['from'])) {
  $from = $_GET['from'];
}
?>

It is also a good coding habit to check, extract, and cleanup values from $_GET at the top of the page, not mixed in with HTML.

In the following code, contact.php only displays a link if $from == 'home'.

// contact.php
<?php
  $from = '';
  if(isset($_GET['from'])) {
    $from = $_GET['from'];
  }
?>

<h1>Contact Us</h1>

<?php if($from == 'home') { ?>
  <a href="index.php">Homepage</a>
<?php } ?>

Encoding $_GET values

URLs can contain most common characters (letters, numbers, underscores, dashes). They cannot have spaces. There are also some reserved characters which have special meanings in the URL and cannot be used.

Reserved URL characters

!  #  $  %  &  '  (  )  *  +  ,  /  :  ;  =  ?  @  [  ]

If any of these characters are used in a URL, they could prevent it from working correctly. Therefore they must be encoded (i.e. "transformed") so they don't interfere with the function of the URL. This is crucial when we are using PHP to output dynamic values to be used in URLs, such as in the query string.

There are different types of encoding depending on the context. Encoding for a URL means converting characters to a "%" followed by two hexadecimal digits.

Hexadecimal: Normal numbers range from 0-9. Hexadecimal numbers include 0-9 but also add letters A-F as "digits". Each digit has 16 possibilities instead of 10.

For example, & would be encoded as %26, and ? would be encoded as %3F.


PHP encoding functions

PHP has two functions for encoding strings for use in a URL.

urlencode()

  • Encodes most non-alphanumeric characters as % + 2-digit hexadecimal
  • Spaces are encoded as +
  • For strings to be used in a URL query string (after ?)

rawurlencode()

  • Encodes most non-alphanumeric characters as % + 2-digit hexadecimal
  • Spaces are encoded as %20
  • For strings to be used in URL path string (before ?)

The difference between these two functions is only in how spaces are handled. In many cases, it doesn't matter and a server will correctly handle either one. However, there are few cases where it does matter. The best recommendation is to use rawurlencode() for the URL path, which is the part after http:// and before any ?, and to use urlencode() for the URL query string, which is the part after the ? where PHP will often output parameter values.

Example:

<?php
  $course = 'web security';
  $query = 'URL encode & decode';
  $url = rawurlencode('/courses/' . $course . '/content');
  $url .= '?search=' . urlencode($query);
?>

<a href="<?php echo $url; ?>">Link label</a>

It is essential that all dynamic values get encoded before being used in URLs for links and forms. Otherwise, there may be many cases where the URL will not function.


PHP decoding functions

PHP also has two functions for decoding these strings to return them back to their original characters.

urldecode(): Decodes strings encoded with urlencode()

rawurldecode(): Decodes strings encoded with rawurlencode()

However, these functions are rarely needed because PHP automatically decodes query parameters before assigning them to $_GET.


Encoding for URLs inside HTML

When outputting URLs to a page, it is critical to also encode all output for HTML.

Fork me on GitHub