Why is it important to escape & in href attributes in tags?

Tuesday, Nov 11, 2014
13 comments Web development

Here’s an example of unescaped & characters in a A HREF tag attribute.
http://jsfiddle.net/32zbogfw/ It’s working fine.

I know it might break XML and possibly XHTML but who uses that still?

And I know an unescaped & in a href shows as red in the View Source color highlighting.

What can go wrong? Why is it important? Perhaps it used to be in 2009 but no longer the case.

This all started because I was reviewing some that uses python urllib.urlencode(...) and inserts the results into a Django template with href="{{ result_of_that_urlencode }}" which would mean you get un-escaped & characters and then I tried to find how and why that is bad but couldn't find any examples of it.

Comments

Post your own comment

Dan November 11, 2014

If I make a blog post that has the following url: http://www.example.com/checkoutmyguitar&they'reawesome/

I NEED to escape the & or else the & will get processed as an &.

Invalid entities will get ignored, which is what you're seeing. It's the edge cases that are the concern. I think.

Peter Bengtsson November 13, 2014

But in that case, the & is in the pathname part of the URL. E.g. http://jsfiddle.net/c5b5L4w1/
So not a problem.

Wladimir Palant November 11, 2014

The issue is that browsers will close whatever they consider incomplete entity references automatically. I don't know the specific algorithm but href="?foo=1&quot" still causes Firefox to add a quotation mark to the end of the URL - and that's what you get instead of a parameter named "quot". Now this doesn't happen for parameters that actually have a value but I wouldn't be so sure about browsers other than Firefox.

Boris Zbarsky November 11, 2014

It really depends on what you have in that attribute.

If you have href="?something&=whatever" you run into a problem if you don't escape the '&'.

If you have href="?something&amp whatever" you also run into a problem.

Or if you have href="?something&amp,something" for that matter.

So if you know for a fact that the thing after your maybe-entity-name is an equals char, you're probably OK. Otherwise, likely not.

Peter Bengtsson November 13, 2014

So as long as I always bundle the key and value with a = in between I'm safe.

Boris Zbarsky November 17, 2014

Not if the unquoted thing is in the value. "?something=&amp&" behaves identically to "?something=&&".

Not to mention the fact that, of course, the unquoted '&' will terminated the key-value pair.

Simon November 11, 2014

I don't think I've ever head of such a thing... escaping ampersands in tag attributes. I mean, I see what you mean about view-source highlighting them as invalid, but I've never written them that way (unless using an XML-based generation tool), nor seen any framework (JSF, etc) that ever renders them that way...

Peter Bengtsson November 13, 2014

For me it's the opposite. I've always been über careful turning & into & in attributes' values. This is because we used to be so strict when XHTML was all the rage.

Now I stopped to think; is it still important at all.

Neil Rashbrook November 12, 2014

There was a time when Gecko used to allow HTML entities without the trailing semicolon. (I don't know what the current parsing rules are here.) That meant that if you had a form parameter named e.g. "macroname" then tried to use it in a hardcoded link e.g. "update.php?action=delete&macro=test" the &macr would get interpreted as a ¯ character.

Peter Bengtsson November 13, 2014

"There was a time".

I'm guessing that goes way back. Even before people switched from HTML4 to XHTML doctypes.

Havvy November 12, 2014

https://html.spec.whatwg.org/multipage/syntax.html#consume-a-character-reference

If the character reference is being consumed as part of an attribute, and the last character matched is not a U+003B SEMICOLON character (;), and the next character is either a U+003D EQUALS SIGN character (=) or an alphanumeric ASCII character, then, for historical reasons, all the characters that were matched after the U+0026 AMPERSAND character (&) must be unconsumed, and nothing is returned. However, if this next character is in fact a U+003D EQUALS SIGN character (=), then this is a parse error, because some legacy user agents will misinterpret the markup in those cases.

--

Basically, your link throws a parse error only because of the equals sign that follows. I did some more testing ( http://jsfiddle.net/8b1h3bqw/ ) and noted that Firefox seems to ignore the rules about ampersands in attributes showing a warning even in the valid case. But then, that's also only in view source, which for some reason, I cannot access via the developer tools. Chrome doesn't report any parse errors anywhere as far as I can.

Anyways, it's important because of those legacy user agents only, and then, only if your parameter has the same name as an HTML entity character reference. In all other cases, there's no problem, probably.

Peter Bengtsson November 13, 2014

Awesome. Just like others have mentioned in comments here; this means that as long as you follow with a = you're fine.

As your example points out; the really big risk is the example of `href="&amp"` where you might hope that the server is going to pick that up as a {'amp': ''} or something. It won't. Instead you'd get nothing from the query string.
It would if it was `href="?ampsomething"` then you could get {'ampsomething': ''}
(NB: different servers accept or simply reject CGI params without a =)

Giorgio Maone November 12, 2014

You should not URL-encode URLs before inserting them into a href attribute: actually, if you URL-encode them they'll likely break.

But you must HTML-escape them, which is what & turned into & is about. Django templates may be configured to do it automatically anyway, see https://docs.djangoproject.com/en/dev/ref/templates/builtins/

If you don't HTML-escape URLs and other variables before merging them in your HTML (especially if they ultimately come from user input) you risk to make your website vulnerable to cross-site scripting (XSS).

P.S.: why in the hell does this blog require JavaScript to be enabled, for extra 3rd party sources too, in order to protect your comment form against CSRF? :(

Previous:: God, No! by Penn Jillette November 9, 2014 Books
Next:: A "perma search" in AngularJS November 18, 2014 AngularJS, JavaScript

Related by category:: Fastest way to find out if a file exists in S3 (with boto3) June 16, 2017 Web development; Be very careful with your add_header in Nginx! You might make your site insecure February 11, 2018 Web development; <datalist> looks great on mobile devices August 28, 2020 Web development; How to have default/initial values in a Django form that is bound and rendered January 10, 2020 Web development

Related by keyword:: Test if two URLs are "equal" in JavaScript July 2, 2020 JavaScript; My tricks for using AsyncHTTPClient in Tornado October 13, 2010 Python, Tornado

Go to top of the page

Why is it important to escape & in href attributes in tags?

Comments

Related posts