URL Rewrite and Relative Paths

I originally posted this as an answer to this question on stackexchange.

In the special case that you want to rewrite everything to a single destination, you might run into the following problem:

If there is content on the page which is located via a relative url (e.g. javascript or css stylesheets), this content will not be found anymore by the browser once the path in the url is not correct anymore. E.g. if the javascript can be found under the url:

https://www.example.com/js/jquery.min.js

and linked in the page like this:

<script src="js/jquery.min.js"></script>

Suppose someone is visiting the page

https://www.example.com/pages/welcome

then their browser will request

https://www.example.com/pages/js/jquery.min.js

which will obviously not be found by the server.

Using the <base>-tag is a nice solution and most browsers seem to handle it well. Except there are some issues with IE, as was to be expected... Apparently you can also run into some other funny problems, see discussion here.

So for people where this is not an option, i have looked into the alternative (the "hard way").

Usually you store css/js/static images/other stuff like this:

index.php
js/
css/
imgs/

and you want the javascript and stylesheets etc. to be available, no matter how many slashes there are in the url. If your url is /site/action/user/new then your browser will request

/site/action/user/css/style.css
/site/action/user/css/framework/fonts/icons.ttf
/site/action/user/js/page.js
/site/action/user/js/jquery/jquery.min.js
/site/action/user/js/some/library/with/deep/dir/structure/file.map

So here are some rewrite rules for apache to solve this... First, if the target actually exists on disk, do not rewrite:

RewriteCond %{REQUEST_FILENAME} -d [OR]
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^.*$ - [L,QSA]

In words, IF reqest filename is a directory OR IF request filename is a file then do not rewrite (-), last rule (L) and pass any GET parameters (QSA, query string append). You can also use

RewriteCond %{REQUEST_FILENAME} -d [OR]
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -l
RewriteRule ^.*$ - [L,QSA]

if you also need symlinks. Next we want the javascript and stylesheets to be found even if the requests assume a wrong base directory as shown above.

RewriteRule ^.*/js/(.*)$ js/$1 [L]
RewriteRule ^.*/css/(.*)$ css/$1 [L]

The pattern is pretty obvious, just replace 'css' with the directory name. There is still a problem with this, especially for large websites with lots of javascript and stylesheets, libraries etc. - The regex is greedy. For example, if you have a javascript directory like this:

js/some/library/js/script.js

and your request goes to /site/action/user/new, the browser will request /site/action/user/new/js/some/library/js/script.js, which the rewrite-engine will then rewrite to

js/script.js

because the first .* is greedy and matches /site/action/user/new/js/some/library. Switching to non-greedy regex does not really make sense, since "the rewrite engine repeats all the rules until the URI is the same before and after an iteration through the rules."

There is another problem, and that is that for every directory that needs to be exempted from rewriting, a relatively "expensive" regex is needed. Both problems can be fixed by just putting every static component into a subdirectory with an "unusual" name (and really this is the best solution imo - anyone with a better idea please mail me).

The directory structure would then look like this:

index.php
mystrangedir/js/
mystrangedir/css/
mystrangedir/imgs/

Of course, this needs to be inserted everywhere in the code - for projects with a large existing codebase this can be tricky. However, you only need a single regex for directory exemption then:

RewriteRule ^.*/mystrangedir/(.*)$ mystrangedir/$1 [L]

Automated build systems (like gulp, grunt....) can be used to check if "mystrangedir" does not exist as directory anywhere below itself (which would again throw off the rewrite engine).

Feel free to rename mystrangedir to something more sensible like static_content but the more sensible it gets, the more probable it is that the directory name is already used in some library. If you want an absolutely safe directory name that has certainly never been used before, use a cryptographic hash, e.g. 010f8cea4cd34f820a9a01cb3446cb93637a54d840a4f3c106d1d41b030c7bcb. This is pretty long to match; you can make a tradeoff between uniqueness and regex performance by shorting it.

Of course the best solution is to replace relative URLs for stylesheets, images, javascript etc. with absolute urls where possible - this will be caught by the RewriteCond that checks for existing files and directories and not go through an expensive regex.

© 2010-2021 Stefan Birgmeier
sbirgmeier@21er.org