IPv6 regex

I spent too much time today playing with IPv6 stuff that I didn’t have any time to work on my latest time sink, Pyrobox. I will have to write about that some other time.

For now, I wanted to get this out there. I was curious about how easy it was to confirm that a string is a valid IPv6 address. It turns out that it is not so simple, thanks to the “space saving” techniques of zero folding that is used. Here are some examples of IPv6 addresses that are valid:

::                           unspecified address
::1                          localhost
fe80::219:7eff:fe46:6c42     link local address
::00:192.168.10.184          embedded IPv4 address

Yes, those are just some of the variety that was introduced that makes the protocol easier to use from a high level, but harder to implement and use from a low level. I mean, I tell my router to advertise IPv6 addresses and within seconds all the machines on the LAN have configured themselves with globally routable IPv6 addresses. Impressive. Yet so very *not* human friendly. There is no way around it, when you are dealing with 128 bits, it is a lot of data to put in a human readable format. That’s what the 8 sets of quad-byte segments are all about, human readability, but it is still so long, nobody will be able to spout off their IP address like they can with IPv4. Thankfully nameservers should help out with that. But sometimes we will have to deal with IPv6 addresses and I want to know how easy it is to recognize a true address from a fake.

I wrote a little ditty in python to help me conceptualize it. I found several sites that had similar regexes to this Ruby parser. I was somewhat dismayed to find out that while they successfully classify a bunch of the namespace correctly, they also have an infinite set of bad addresses they classify as good. Oops. Time to look at your regex handbook again, folks. To remedy their situation is not so easy though. Here is the python program I wrote that generates a regex:

 import re

def valid_ipv6(addr, debug=False):
	# we will build an array of matches and then join them
	a = []
	# the simplest match in the ipv6 address
	sm = r'[0-9a-f]{1,4}'
	for i in range(1,7):
		a.append(r'A(%s:){1,%d}(:%s){1,%d}Z' % (sm, i, sm, 7-i))
	a.append(r'A((%s:){1,7}|:):Z' % sm)
	a.append(r'A:(:%s){1,7}Z' % sm)
	a.append(r'A((([0-9a-f]{1,4}:){6})(25[0-5]|2[0-4]d|[0-1]?d?d)(.(25[0-5]|2[0-4]d|[0-1]?d?d)){3})Z')
	# support for embedded ipv4 addresses in the lower 32 bits
	ipv4 = r'(25[0-5]|2[0-4]d|[0-1]?d?d)(.(25[0-5]|2[0-4]d|[0-1]?d?d)){3}'
	a.append(r'A((%s:){5}%s:%s)Z' % (sm, sm, ipv4))
	a.append(r'A(%s:){5}:%s:%sZ' % (sm, sm, ipv4))
	for i in range(1,5):
		a.append(r'A(%s:){1,%d}(:%s){1,%d}:%sZ' % (sm, i, sm, 5-i, ipv4))
	a.append(r'A((%s:){1,5}|:):%sZ' % (sm, ipv4))
	a.append(r'A:(:%s){1,5}:%sZ' % (sm, ipv4))
	bigre = "("+")|(".join(a)+")"
	if debug:
		for i in range(len(a)):
			r = re.compile(a[i], re.I)
			if r.search(addr):
				print a[i]
		print "n%s" % bigre
	bigre = re.compile(bigre, re.I)
	return bigre.search(addr) and True

if __name__ == '__main__':
	import sys
	if len(sys.argv) < 2:
		print "usage: %s "%sys.argv[0]
		sys.exit(1)
	
	if valid_ipv6(sys.argv[1], True):
		print "valid"
		sys.exit(0)
	else:
		print "invalid"
		sys.exit(1)

When it runs, it can print out the final regex (which embodies the entire IPv6 address language as far as I can tell). Here is that regex:

(A([0-9a-f]{1,4}:){1,1}(:[0-9a-f]{1,4}){1,6}Z)|
(A([0-9a-f]{1,4}:){1,2}(:[0-9a-f]{1,4}){1,5}Z)|
(A([0-9a-f]{1,4}:){1,3}(:[0-9a-f]{1,4}){1,4}Z)|
(A([0-9a-f]{1,4}:){1,4}(:[0-9a-f]{1,4}){1,3}Z)|
(A([0-9a-f]{1,4}:){1,5}(:[0-9a-f]{1,4}){1,2}Z)|
(A([0-9a-f]{1,4}:){1,6}(:[0-9a-f]{1,4}){1,1}Z)|
(A(([0-9a-f]{1,4}:){1,7}|:):Z)|
(A:(:[0-9a-f]{1,4}){1,7}Z)|
(A((([0-9a-f]{1,4}:){6})(25[0-5]|2[0-4]d|[0-1]?d?d)(.(25[0-5]|2[0-4]d|[0-1]?d?d)){3})Z)|
(A(([0-9a-f]{1,4}:){5}[0-9a-f]{1,4}:(25[0-5]|2[0-4]d|[0-1]?d?d)(.(25[0-5]|2[0-4]d|[0-1]?d?d)){3})Z)|
(A([0-9a-f]{1,4}:){5}:[0-9a-f]{1,4}:(25[0-5]|2[0-4]d|[0-1]?d?d)(.(25[0-5]|2[0-4]d|[0-1]?d?d)){3}Z)|
(A([0-9a-f]{1,4}:){1,1}(:[0-9a-f]{1,4}){1,4}:(25[0-5]|2[0-4]d|[0-1]?d?d)(.(25[0-5]|2[0-4]d|[0-1]?d?d)){3}Z)|
(A([0-9a-f]{1,4}:){1,2}(:[0-9a-f]{1,4}){1,3}:(25[0-5]|2[0-4]d|[0-1]?d?d)(.(25[0-5]|2[0-4]d|[0-1]?d?d)){3}Z)|
(A([0-9a-f]{1,4}:){1,3}(:[0-9a-f]{1,4}){1,2}:(25[0-5]|2[0-4]d|[0-1]?d?d)(.(25[0-5]|2[0-4]d|[0-1]?d?d)){3}Z)|
(A([0-9a-f]{1,4}:){1,4}(:[0-9a-f]{1,4}){1,1}:(25[0-5]|2[0-4]d|[0-1]?d?d)(.(25[0-5]|2[0-4]d|[0-1]?d?d)){3}Z)|
(A(([0-9a-f]{1,4}:){1,5}|:):(25[0-5]|2[0-4]d|[0-1]?d?d)(.(25[0-5]|2[0-4]d|[0-1]?d?d)){3}Z)|
(A:(:[0-9a-f]{1,4}){1,5}:(25[0-5]|2[0-4]d|[0-1]?d?d)(.(25[0-5]|2[0-4]d|[0-1]?d?d)){3}Z)

Yes, I realize that is one big regex (note that the lines should all really be concatenated, but I broke it apart at the | points to make it easier to read). But I think it is more complete and correct than other regexes that google helped me to find. If anyone knows a better way to do this using some regex fu, please let me know.

MythTV + MediaMVP = Time Shifted Television

I have long been slightly jealous of Darren’s MythTV setup. I kept telling myself that I have enough other projects (a.k.a. kids) to keep myself busy for the next 18 years. Plus, the VCR and TV have always been fine for our needs and up until about a month ago were working fine. The TV has never really been what I would call a great piece of electronic equipment. A great piece of something. But it was free and I can’t argue with that. It still works if not for its slightly discolored screen. The VCR is in the same boat. But it finally did give up the ghost. First it stopped rewinding tapes and then it stopped recording. So I tossed it. But that left us without a way to record Sesame Street. Dun dun dun…

Ever since the invention of the VCR, Americans have loved the ability to watch Time Shifted Television (TST). Two shows you want to watch are always on at the same time. Record one and watch the other; then watch the second show at your leisure. Simple solution. Enter the digital age. Hello TiVo. Thank you, Richard Stallman, for telling it how it is. Goodbye TiVo, hello MythTV. MythTV was created by a guy that didn’t want to have to pay the costly startup fees and crazy monthly fees associated with a commercial Digital Video Recorder (DVR) so he wrote his own software that runs on a personal computer that has a TV tuner card in it. To make things better, he agrees with Richard Stallman and released it under the GPL, which means that everyone and his dog can get it for free, use it, hack it, redistribute it, etc. So that is what I am using. Yeah!

We came very close to choosing TiVo as our DVR. I had plans to buy a basic 80 hour box and add a larger hard drive if we needed it. I also planned on buying the lifetime subscription if we liked the service. I hate monthly fees. They are the bane of my existence. “For less than a dollar a day you can have…” a thirty dollar bill at the end of each month. EVERY MONTH. But I digress. The lifetime subscription was the equivalent of 2 years of service. I figured if we liked TiVo and stuck with it for 2 years, it would be worth our while. But they (the big wigs at TiVo) came up with this grand plan to make more money and wring every last dime out of their users. The spin they put on it was something like “No up-front fees” and something about only slightly higher monthly fees in small print. The idea is that users no longer have to buy a TiVo box — they lease one as part of the 1, 2, or 3 year contract for service. When the contract is over, you can upgrade your box when you renew it. Sounds like a great plan! If you want to pay $20 a month for the rest of your life.

Enter MythTV. (I think I already said that). I figured with TiVo, the startup costs would have been about $400 including the lifetime subscription. So I set out to find parts to fix up one of my old computers to make it a worthy Myth backend (server). I looked at all the shiny boxes that you can put in your entertainment center; the ones that don’t even look like computers and are so silent they make your fridge seem noisy. They cost about 3 times what I was willing to pay. After talking some with Darren, I decided the best way to go was to have a backend with the tuner cards and a big disk drive and a itty-bitty, thin-client frontend that hooks up to the TV. So we bought a Hauppage PVR 500 dual tuner card and a Hauppage MediaMVP for the frontend. Speaking of opensource software, a group of people hacked the MediaMVP software so it can run a program that speaks the MythTV protocol (basically it is a MythTV frontend). The project is MVP Media Center (MVPMC). It is a Linux kernel running a small program. Very cool.

So now we can tell MythTV what we want to watch and decide for ourselves when we want to watch it. Now if only I could keep it running all the time (it seems to segfault now and then…) then all would be well in the Domestic Tranquility Department. 🙂

MauOS makes its public debut

After a request from a friend to see the source for my personal OS toy, I have released the first public version of MauOS. I started MauOS as a way to learn more about the i386 architecture. It turned into quite a fun little project. Even though it doesn’t do much more than switch between two tasks and respond to timer interrupts, it was a lot of fun to delve into the sick minds of the creators of the i386.

As it happens, when I was being interviewed by IBM, they learned of MauOS and the word got out that I am an overachiever. Just because I like to play with the bits of my emulated machine does not make me an overachiever.

MauOS runs well in bochs and sometimes on real hardware if you are that crazy. Have fun and fix it for me. Maybe someday it will take over the world. Downloads are available at the download page.

Linux Laptop Howto

I finally got my IBM T40 to a point that it is very usable and mostly stable. (When I say mostly stable, that means it about half as much as w1nD0w$ does, i.e. once a week or less) OS development/debugging is generally pretty abusive to computers, so I need stability. Anyway, I wrote up a page or two of how I configured my machine and thought I would make it public. Debian GNU/Linux on a IBM T40