Python’s binascii – hexlify() and unhexlify()
What the heck?
Today, a dear friend of mine came up to me and asked about the Python module binascii – particularly about the methods hexlify() and unhexlify(). Since he asked for it, I’m going to share my answer publicly with you.
First of all, I’m defining the used nomenclature:
- ASCII characters are being written in single quotes
- decimal numbers are of the type Long with a L suffix
- hex values have a x prefix
First, let me quote the documentation:
I’ll begin with hexlify(). As the documentation states, this method splits a string which consists of hex-tuples into distinct bytes.
The ASCII character ‘A’ has 65L as numerical representation. To verify this in Python:
>>> long(ord('A')) 65L
You might ask “Why is this even relevant to understand binascii?” Well, we don’t know anything about how ord() does its job. But with binascii we can re-calculate manually and verify.
>>> binascii.hexlify('A') '41'
Now we know that an ‘A’ – interpreted as binary data and shown in hex – resembles ’41’. But wait, ’41’ is a string and no hex value! That’s no biggy, hexlify() represents its result as string.
To stay with the example, let’s convert 41 into a decimal number and check if it equals 65L.
>>> long('41', 16) 65L
Tada! It seems that ‘A’ = 41 = 65L.
You might have known that already, but please, stay with me a minute longer.
To make it look a little more complex:
>>> binascii.hexlify('A') == "%X" % long('41', 16) True
Be aware that
>>> "%X" %n
converts a decimal number into its hex representation.
——
binascii.unhexlify() naturally does the same thing as hexlify(), but in reverse. It takes binary data and displays it in tuples of hex-values.
I’ll start off with an example:
>>> binascii.unhexlify('41') 'A' >>> binascii.unhexlify("%X" % ord('A')) 'A'
Here, unhexlify() takes the numerical representation 65L from the ASCII character ‘A’
>>> ord('A') 65
converts it into hex 41
>>> "%X" % ord('A') '41'
and represents it as a 1-tuple (meaning dimension of one) of hex values.
And now the conclusio – why might all of this be useful?
Right now, I can think of at least four use cases:
- cryptography
- data-transformation (i.e. Base64 for MIME/E-Mail attachements)
- security (deciphering binary readings off a network, pattern matching, …)
- textual representation of escape sequences
Taking up the last example, I’ll show you how to visualize the Bell esape sequence (you know, that thing that keeps beeping in your terminal).
Taken from the ASCII table, the numerical representation of the Bell is 7. Programmers might know it better as a.
>>> '7' == 'a' True
Presuming you read such a character in some kind of binary data – for example from a socket
>>> foo = '7'
and you want to visualize this data
>>> print foo
you will not get any results – at least none visible. You might hear the Bell sound if you’re not on a silent terminal.
Now, finally – binascii to the rescue:
>>> binascii.hexlify('7') '07'
Voilà, the dubious string is decrypted.