Blog of Daniel Ruf

#php

don't blindly trust FILTER_VALIDATE_URL

25.05.2020

A few weeks ago I checked some logfiles of a WordPress website and found some weird base64 encoded strings which were used for some sort of image proxy (public/image.php) for a plugin.

I checked the file and directly found some issues regarding the way it handled the supplied base64 encoded data using FILTER_VALIDATE_URL.

Here is the original code from the file:

<?php
$url = ( isset( $_GET['url'] ) ) ? $_GET['url'] : '';

if ( empty( $url ) )
	die( 'URL is missing.' );

$url = base64_decode( $url );

if ( function_exists('filter_var') && ! filter_var( $url, FILTER_VALIDATE_URL )  )
	die( 'URL format invalid.' );

if ( strpos( $url, 'media-amazon.com') === false && strpos( $url, 'images-amazon.com') === false )
	die( 'URL target invalid.' );

if ( strpos( $url, ".jpg" ) || strpos( $url, ".jpeg" ) ) {
	header( "Content-Type: image/jpeg" );

} elseif ( strpos( $url, ".png" ) ) {
	header( "Content-Type: image/png" );

} else {
	die( 'File type is not supported.' );
}

readfile( $url );

In this code snippet you can see that the URL parameter is unsanitized. The parameter is a base64 encoded string which is decoded and the supplied URL is loaded with readfile after some brief checks.

The main problems are the following:

<?php
// ...
if ( function_exists('filter_var') && ! filter_var( $url, FILTER_VALIDATE_URL )  )

FILTER_VALIDATE_URL allows many more URLs than you might think. For example URLs which begin with the following strings:
file://, ftp://, http://, https://, ldap://, mailto:, php://, telnet://

This is also mentioned in the description:

Beware a valid URL may not specify the HTTP protocol http:// so further validation may be required to determine the URL uses an expected protocol, e.g. ssh:// or mailto:. Note that the function will only find ASCII URLs to be valid; internationalized domain names (containing non-ASCII characters) will fail.

The next problem was how the provided URL was verified:

<?php
// ...
if ( strpos( $url, 'media-amazon.com') === false && strpos( $url, 'images-amazon.com') === false )

This checks only if the supplied value contains media-amazon.com or images-amazon.com, but it does not matter where it appears in the value of $url. It does not check if this is the supplied hostname.

And then there was the image validation:

<?php
// ...
if ( strpos( $url, ".jpg" ) || strpos( $url, ".jpeg" ) ) {
// ...
} elseif ( strpos( $url, ".png" ) ) {
// ...

This checks if the value of $url contains .jpg, .jpeg or .png. Again, it does not check if it ends with one of these strings.

And then there is the last step of loading the image using the URL:

<?php
// ...
readfile( $url );

The readfile function can be used to load files using a filename, and also URLs.

A URL can be used as a filename with this function if the fopen wrappers have been enabled. See fopen() for more details on how to specify the filename. See the Supported Protocols and Wrappers for links to information about what abilities the various wrappers have, notes on their usage, and information on any predefined variables they may provide.

The accepted protocols and wrappers according to the docs are:
data://, expect://, file://, ftp://, glob://, http://, https://, ogg://, phar://, php://, rar://, ssh2://, zlib://

By providing a base64 encoded URL these checks can be easily fulfilled and you can load any files through this script:

ftp://user:password@ftp.example.com/file.txt?name=images-amazon.com.jpeg
ZnRwOi8vdXNlcjpwYXNzd29yZEBmdHAuZXhhbXBsZS5jb20vZmlsZS50eHQ/bmFtZT1pbWFnZXMtYW1hem9uLmNvbS5qcGVn

https://danielruf.github.io/images-amazon.com/testfile.jpeg.php
aHR0cHM6Ly9kYW5pZWxydWYuZ2l0aHViLmlvL2ltYWdlcy1hbWF6b24uY29tL3Rlc3RmaWxlLmpwZWcucGhw

The final URL would look like this:
public/image.php?url=aHR0cHM6Ly...

These risks are also mentioned in this blog article.

So if you accept and parse any user supplied input for loading files, always ensure that only allowed URLs are supplied or dynamically build them.

Checking and parsing the supplied URL can be done by parsing it with parse_url, parse_string, regular expression and correct positional strpos checks.

There are many more such cases on GitHub.

Malicious actors could use this to spread malware using trusted domains which use such a proxy script by leveraging the flaws in the URL validation.

I think this can be used for several attacks such as Reflected File Download (RFD) attacks (BH2014 talk).

The problems were reported to the developer of the plugin and he quickly released updates which mitigate these problems by checking the supplied URL using regular expressions:

<?php
$url = ( isset( $_GET['url'] ) ) ? $_GET['url'] : '';

if ( empty( $url ) )
	die( 'URL is missing.' );

$url = base64_decode( $url );

// Validate URL.
if ( ! filter_var( $url, FILTER_VALIDATE_URL ) || (
    ! preg_match('/^https:\/\/images-(cn|eu|fe|na)\.ssl-images-amazon.com\/images\/I\/(?:[A-Za-z0-9\-\+\_\%]+)\.(?:[A-Za-z0-9\_]+)\.(jpg|jpeg|png)/', $url ) &&
    ! preg_match('/^https:\/\/m\.media-amazon.com\/images\/I\/(?:[A-Za-z0-9\+\-\_\.\%]+)\.(jpg|jpeg|png)/', $url ) ) ) {
    die( 'Invalid image.' );
}

// Validate file.
if ( substr_compare( $url, '.jpg', -strlen( '.jpg' ) ) === 0 || substr_compare( $url, '.jepg', -strlen( '.jepg' ) ) === 0 ) {
    header( "Content-Type: image/jpeg" );
} elseif ( substr_compare( $url, '.png', -strlen( '.png' ) ) === 0 ) {
    header( "Content-Type: image/png" );
} else {
    die( 'Invalid image.' );
}

readfile( $url );

An additional improvement would be to set the X-Content-Type-Options: nosniff header to prevent MIME Sniffing attacks.

Do not forget to always apply strict checks and proper validation if you parse any user supplied input like URLs. Otherwise you may have several security related problems.

Timeline

2020-05-09 15:15 checked some logfiles and saw the base64 strings for the image proxy
2020-05-09 15:49 asked in a WP community for the latest version for further checks
2020-05-09 20:03 informed developer of plugin about the problem
2020-05-11 19:26 saw new version in changelog
2020-05-11 20:13 checked the version and notified WP community
2020-05-11 21:01 let the developer know that the problems are mitigated