Tag Archives: javascript

Match any character including new line in Javascript Regexp

It seems like the dot character in Javascript’s regular expressions matches any character except new line and no number of modifiers could change that. Sometimes you just want to match everything and there’s a couple of ways to do that.

You can pick an obscure character and apply a don’t match character range with it ie [^`]+. This is not true match any character though. Or you can try [.\r\n]+ which doesn’t seem to work at all. (?:\r|\n|.)+ works fine, but as you’ll find out soon, it is notoriously slow as each time you use it, you are creating a new 3 way branching point because of the brackets.

The perfect way I’ve found is actually a nicer variation of the first idea:
[^]+
Which means ‘don’t match no characters’, a double negative that can re-read as ‘match any character’. Hacky, but works perfectly.

Javascript snippet to convert raw UTF8 to unicode

For the I-don’t-a-sane-use-for-this department comes this piece of code which takes a stream of raw UTF-8 bytes, decodes it and fromCharCode it, rendering it in a unicode supported browser. A possible use would be if the web page character set is not UTF-8 and you want to display UTF-8. To use it, just put it in a script tag and call utf8decode(myrawutf8string). But seriously, all web pages should be UTF-8 by default nowadays. Here it is, in case anyone wants it:

function TryGetCharUTF8(c, intc, b, i, count)
		{
			/*
			 * 10000000 80
			 * 11000000 C0
			 * 11100000 E0
			 * 11110000 F0
			 * 11111000 F8
			 * 11111100 FC
			 * 
			 * FEFF = 65279 = BOM
			 * 
			 * string musicalbassclef = "" + (char)0xD834 + (char)0xDD1E; 119070 0x1D11E
			 */

			if ((b.charCodeAt(i) & 0x80) == 0)
			{
				intc = b.charCodeAt(i);
			}
			else
			{
				if ((b.charCodeAt(i) & 0xE0) == 0xC0)
				{
					//if (i+1 >= count) return false;
					intc = ((b.charCodeAt(i) & 0x1F) << 6) | ((b.charCodeAt(i + 1) & 0x3F));
					
					i += 1;
				}
				else if ((b.charCodeAt(i) & 0xF0) == 0xE0)
				{
					// 3 bytes Covers the rest of the BMP
					//if (i+2 >= count) return false;
					intc = ((b.charCodeAt(i) & 0xF) << 12) | ((b.charCodeAt(i + 1) & 0x3F) << 6) | ((b.charCodeAt(i + 2) & 0x3F));
					alert(b.charCodeAt(i) + ' '+b.charCodeAt(i + 1) +' '+b.charCodeAt(i + 2));
					i += 2;
				}
				else if ((b.charCodeAt(i) & 0xF8) == 0xF0)
				{
					intc = ((b.charCodeAt(i) & 0x7) << 18) | ((b.charCodeAt(i + 1) & 0x3F) << 12) | ((b.charCodeAt(i + 2) & 0x3F) << 6) | ((b.charCodeAt(i + 3) & 0x3F));
					
					i += 1;
				}
				else
					return false;
			}
window.utf8_out_intc = intc;
window.utf8_out_i = i;
			return true;
		}

function utf8decode(s) {
	var ss = "";
	for(utf8_out_i = 0; utf8_out_i < s.length; utf8_out_i++) {
		TryGetCharUTF8(window.utf8_out_c, window.utf8_out_intc, s, window.utf8_out_i, s.length);
		ss += String.fromCharCode(window.utf8_out_intc);
	}
	return ss;
}

Detecting the back (or refresh) button click

While developing a web app, I came across an interesting problem: I had a page which had a button to perform an action. If the button is clicked, the action request is sent to the server side script and redirected back to the same page but with a message displayed on the top of the page (ie Your post has been submitted).

If you then navigate to another page but click back, you would see the same page with the same message popping up. I want to detect that we’re clicking back so we will hide the message. There are plenty of solutions in google, but a lot of them involved setting a cookie (what if cookies are disabled), or a server side script detecting referer (what if page is still cached?), or using time by detecting if the server page load time and the current time differs by a large amount (what if client time is wrong?). Without an ideal solution, I set about finding a new solution. Surely it can’t be hard to detect that we’ve already been in that same page. If only there was a way to save a flag just for that page and for the duration of the page session. I tried modifying the DOM, but that gets reverted when you click back. The onload event also get called again, so you can’t use that to differentiate.

I then remembered that at least on recent browsers, there exists a functionality in forms that retained form field information if you clicked back – very handy if you’re submitting a post and the connection died, you can just click back and your long winded post would be intact.

Solution – Use a hidden form field to detect that we’ve been on this page before

Building on this idea, it’s possible to temporarily store a flag on a hidden form field that says, yep I’ve been on this page before. Here is a code snippet:

<html>
<body>
Try
<a href="http://www.google.com/">jumping to another page</a>
</body>

<script>

document.write("<form style='display: none'><input name='__detectback' id='__detectback' value=''></form>");

function checkPageBackOrRefresh(load_id) {
if (document.getElementById('__detectback').value == load_id) {
return true;
} else {
document.getElementById('__detectback').value = load_id;
return false;
}
}

window.onload = function() {
if (checkPageBackOrRefresh('tt'))
alert('You clicked back or refreshed the page');
}

</script>

</html>

Unfortunately, this solution does not work in some browsers where “fast back” (ie, fbcache in firefox) is enabled, as the fast back stores the scripting state so a onload does not trigger again.

The script should work fine with IE7 and IE8. With Firefox, it only works on certain pages. These pages seems to be pages that link to heavy javascripts (ie jquery?).

With fbcache enabled browsers, a possible solution would be to hide the message at the event onbeforeunload so it will not appear even when clicking back.