I’ve encountered different techniques which (try to) solve this problem. Some of them escape only the single/double quotes, others sanitize the input by removing unexpected characters, etc. The solution should however be more general, and thus bullet proof.
We have no doubts on how to escape arbitrary data which we want displayed in an HTML page. We convert all special characters to HTML entities, and most programming languages have a function for that. In PHP that’s the htmlspecialchars() function. No developer writes their own version by substituting the ampersand character with “&”, for example, and so on.
Why re-invent the wheel when dealing with arbitrary data for JavaScript in an HTML page then. JavaScript expects data to be escaped in JSON — “Since JSON is a subset of JavaScript, it can be used in the language with no muss or fuss”.
The rules of thumb are:
- When supplying arbitrary data to JavaScript, encode it as JSON. Let json_encode() put the opening and closing quotes.
- If the JavaScript code is embedded in HTML code, the whole thing needs to be additionally HTML-escaped (converted to HTML entities).
Enough theory, let’s see the source code:
<?php $data = 'Any data, including <html tags>, \'"&;(){}'."\nNewline"; ?> <html> <body> <script> // JavaScript not in HTML code, because we are inside a <script> block js_var1 = <?=json_encode($data)?>; </script> The input data is: <?=htmlspecialchars($data)?> <br><br> <a href="#" onclick="alert(<?=htmlspecialchars(json_encode($data))?>)"> JavaScript in HTML code; supply data directly. </a> <br><br> <a href="#" onclick="alert(js_var1)"> JavaScript in HTML code; supply data indirectly by using a JavaScript variable. </a> </body> </html>
The result seems a bit weird, even like a broken HTML, when we supply the data directly inside the HTML code:
<a href="#" onclick="alert("Any data, including <html tags>, '\"&;(){}\nNewline")"> JavaScript in HTML code; supply data directly. </a>
A side note: Make sure that for PHP you stay in UTF-8, because json_encode() requires this, and htmlspecialchars() also interprets encodings.
I’ll be glad to hear your comments or see an example where this method of escaping fails.