Tag: python


Python’s binascii - hexlify() and unhexlify()
What the heck?

Today, a dear friend of mine came up to me and asked about the Python module binascii - particularly about the methods hexlify() and unhexlify(). Since he asked for it, I’m going to share my answer publicly with you.

First of all, I’m defining the used nomenclature:

  • ASCII characters are being written in single quotes
  • decimal numbers are of the type Long with a L suffix
  • hex values have a \x prefix

First, let me quote the documentation:

binascii.b2a_hex(data)
binascii.hexlify(data)
Return the hexadecimal representation of the binary data. Every byte of data is converted into the corresponding 2-digit hex representation. The resulting string is therefore twice as long as the length of data.
binascii.a2b_hex(hexstr)
binascii.unhexlify(hexstr)
Return the binary data represented by the hexadecimal string hexstr. This function is the inverse of b2a_hex()hexstr must contain an even number of hexadecimal digits (which can be upper or lower case), otherwise a TypeError is raised.

I’ll begin with hexlify(). As the documentation states, this method splits a string which consists of hex-tuples into distinct bytes.

The ASCII character ‘A’ has 65L as numerical representation. To verify this in Python:

>>> long(ord('A'))
65L

You might ask “Why is this even relevant to understand binascii?” Well, we don’t know anything about how ord() does its job. But with binascii we can re-calculate manually and verify.

>>> binascii.hexlify('A')
'41'

Now we know that an ‘A’ - interpreted as binary data and shown in hex - resembles ‘41′. But wait, ‘41′ is a string and no hex value! That’s no biggy, hexlify() represents its result as string.

To stay with the example, let’s convert \41 into a decimal number and check if it equals 65L.

>>> long('41', 16)
65L

Tada! It seems that ‘A’ = \41 = 65L.
You might have known that already, but please, stay with me a minute longer.

To make it look a little more complex:

>>> binascii.hexlify('A') == "%X" % long('41', 16)
True

Be aware that

>>> "%X" %n

converts a decimal number into its hex representation.

——

binascii.unhexlify() naturally does the same thing as hexlify(), but in reverse. It takes binary data and displays it in tuples of hex-values.

I’ll start off with an example:

	>>> binascii.unhexlify('41')
	'A'

	>>> binascii.unhexlify("%X" % ord('A'))
	'A'

Here, unhexlify() takes the numerical representation 65L from the ASCII character ‘A’

	>>> ord('A')
	65

converts it into hex \41

	>>> "%X" % ord('A')
	'41'

and represents it as a 1-tuple (meaning dimension of one) of hex values.

And now the conclusio - why might all of this be useful?
Right now, I can think of at least four use cases:

  • cryptography
  • data-transformation (i.e. Base64 for MIME/E-Mail attachements)
  • security (deciphering binary readings off a network, pattern matching, …)
  • textual representation of escape sequences

Taking up the last example, I’ll show you how to visualize the Bell esape sequence (you know, that thing that keeps beeping in your terminal).
Taken from the ASCII table, the numerical representation of the Bell is \07. Programmers might know it better as \a.

	>>> '\07' == '\a'
	True

Presuming you read such a character in some kind of binary data - for example from a socket

	>>> foo = '\07'

and you want to visualize this data

	>>> print foo

you will not get any results - at least none visible. You might hear the Bell sound if you’re not on a silent terminal.

Now, finally - binascii to the rescue:

	>>> binascii.hexlify('\07')
	'07'

Voilà, the dubious string is decrypted.


Requests: 377

Comment » | personal

Disable Mail-Forwarding for Lotus Notes programmatically

Lotus Notes has a nifty feature to lull managers into false safety: for volatile/unsafe e-mails (or users), it let’s you disable printing/forwarding and copying to clipboard. This can be done using rules, on the SMTP server and on a per e-mail basis. When writing somebody you really don’t trust with some information (but in his inability to spread the word otherwise - by copy/pasting for example), writing a mail would look like this:

prevent_copying

Now, if your victim wants to forward your mail, Lotus Notes would respond with a little pop-up:

success

This certainly looks like a magical and proprietary feature, doesn’t it?  Let’s look at the source of such a “mail”(aka memo in Notus’ language) - you will have to forward it to another mail-client though, because memos can’t be displayed in source:

...
Subject: Testnachricht
MIME-Version: 1.0
Sensitivity: Private
X-Mailer: Lotus Notes Release 6.5.5  CCH1 March 07, 2006
...

As you can see, there is a proprietary meta-flag Sensitivity: Private. It can be reproduced with any decent mail user agent or programmatically. What follows is a little Python code snippet that just does the trick:

import smtplib
from email.message import Message
msg = Message()
msg.set_payload("Testmessage Body")
msg["Subject"] = "Testmessage from Python"
msg["From"] = "preek@dispatched.ch"
msg["To"] = "somebody@somewhere.com"
msg["Sensitivity"] = "Private"
smtp = smtplib.SMTP("localhost")
smtp.sendmail("preek@dispatched.ch", "somebody@somewhere.com", msg.as_string())

But please, don’t use this information unless you absolutely have to. Lotus Notes.. *brr*.

Enjoy(;

If you liked this article, please feel free to re-tweet it and let others know.

twitter_preek

Requests: 289

1 comment » | articles

VIM as Python IDE

Finding the perfect IDE for Python isn’t an easy feat. There are a great many to chose from, but even though some of them offer really nifty features, I can’t help myself but feel attracted to VIM anyway. I feel that no IDE accomplishes the task of giving the comfort of complete power over the code - something is always missing out. This is why I always come back to using IDLE and VIM. Those two seem to be best companions when doing some quick and agile hacking - but when it comes to managing bigger and longer term projects, this combo needs some tweaking. But when it’s done, VIM will be a powerful IDE for Python - including code completion(with pydoc display), graphical debugging, task-management and a project view.

This is where we are going:

VIM as Python IDE

So, these are my thoughts on a VIM setup for coding (Python).

Modern GUI VIM implementations like GVIM or MacVIM give the user the opportunity to organize their open files in tabs. This might look convenient, but to me it is rather bad practice, because a second tab will not be in the in the same buffer scope as the first one which takes away from future interaction options between the two. Using MiniBufExplorer, however, gives the user tabs(not only in the GUI, but also in command line) and leaves the classic buffer interaction intact.

MiniBuf Explorer

Being able to neatly work on multiple files, the user still misses the potential his favourite IDE gives him in visualizing classes, functions and variables. Luckily there are quite a few plugins around to accomplish this task just as well. My favourite one would be TagList. TagList uses Exuberant Ctags for actually generating the tags(note: it really relies on this specific version of ctags - preinstalled implementations on UNIX systems won’t work).

TagList

A lot of coders have the habit of using TODO or FIXME statements in their code. Other IDEs often rely on having good third party project management software, but not VIM. There are great plugins like Tasklist reminding the programmer of those lines of code. Tasklist even implements custom lists - to me that’s an incredible productivity gain.

TaskList

In these times, the programmer knows his or her programming language more or less by interactively finding out what it can do. Therefore code completion(sometimes also called IntelliSense*ugh*) is a major feature. I have heard  many people saying that this is where VIM fails - but luckily they are plain wrong(; In V7, VIM introduced omni completion - given it is configured to recognize Python (if not, this feature is only a plugin away) Ctrl+x Ctrl+o opens a drop down dialog like any other IDE - even the whole Pydoc gets to be displayed in a split window.

Omni Completion

Probably the most wanted feature(besides code completion) is debugging graphically. VimPDB is a plugin that lets you do just that(. I acknowledge it is no complete substitution for a full fledged graphical debugger, but I honour the thought that having to rely on a debugger (often), is a hint of bad design.

VimPDB

From the eye-candy to the implementation. Don’t worry, it’s no sorcery.

First of all, make sure you have VIM version 7.x installed, compiled with Python support. To check for the second, enter :python print “hello, world” into VIM. If you see an error message like “E319: Sorry, the command is not available in this version”, then it’s time to get a new one. If you’re on a Mac, just install MacVIM(there’s also a binary for the console in /Applications/MacVim.app/Contents/MacOS/). If you’re on Windows, GVIM will suffice(for versions != 2.4 search for the right plugin). If you’re on any other machine, you will probably know how to compile your very own VIM with Python support.

Second, check if you have a plugin directory. In Unix it would typically be located in $HOME/.vim/plugin, in Windows in the Program Files directory. If it doesn’t exist, create it.

Now, let’s start with the MiniBufExplorer. Get it and copy it into your plugin directory. To start it automatically when needed and be able to use it with keyboard and mouse commands, append these lines in your vimrc configuration:

let g:miniBufExplMapWindowNavVim = 1
let g:miniBufExplMapWindowNavArrows = 1
let g:miniBufExplMapCTabSwitchBufs = 1
let g:miniBufExplModSelTarget = 1

For a project view, get TagList and Exuberant Ctags. To install Ctags, unpack it, go into the directory and do a compile/install via:

./configure && sudo make install

Ctags will then be installed in /usr/local/bin. When using a Windows machine, I recommend Cygwin with GCC and Make; it’ll work just fine. If you don’t want to tamper with your original ctags installation, you can propagate the location to VIM by appending the following line to vimrc:

let Tlist_Ctags_Cmd='/usr/local/bin/ctags'

To install TagList, just drop it into VIMs plugin directory. You will now be able to use the project view by typing the command :TlistToggle.

Tasklist is a simple plugin, too. Copying it into the plugin directory will suffice. I like to have shortcuts and have added

map T :TaskList<CR>
map P :TlistToggle<CR>

to vimrc. Pressing T will then open the TaskList if there are any tasks to process. q quits the TaskList again.

VimPDB is a plugin, as well. Install as before and see the readme for documentation. If it doesn’t work out of the box, watch for the known issues.

To enable code(omni) completion, add this line to your vimrc:

autocmd FileType python set omnifunc=pythoncomplete#Complete

If it doesn’t work then, you’ll need this plugin.

My last two recommondations are setting these lines to comply to PEP 8(Pythons’ style guide) and to have decent eye candy:

set expandtab
set textwidth=79
set tabstop=8
set softtabstop=4
set shiftwidth=4
set autoindent
:syntax on

There are certainly a lot more flags to help productivity, but those will probably be more user specific.

Have fun coding Python while not being bound to a specific IDE, but having all the benefits of VIM bundled with a few helping hands. Enjoy, everyone.

If you liked this article, please feel free to re-tweet it and let others know.


    You should follow me on twitter here
twitter_preek

Requests: 12 028

58 comments » | articles

Juno on Solaris 10

Juno is an incredibly lightweight webframework. Using Python as backend, it fullfills my very need for just about every small application I want to deploy against the web. It has no need for big runtimes on the server, no files to configure a great many files and most importantly: there’s no coding overhead - the programmer defines only the distinctively wanted features.
However, installing Juno on Solaris 10 isn’t quite as easy as described in Junos’ documentation. Solaris ships with Python 2.4, but Juno depends in Jinja2(a templating engine) which itself depends on Python 2.5+. Even installing Blastwave’s or Sunfreeware’s version won’t help. But that’s no biggie since compiling your own Python is incredibly easy.

  1. Get, compile and install Python (I have used version 2.5.4)
  2. Get, compile and install Setuptools

  3. Get, compile and install pysqlite
  4. easy_install install sqlalchemy
  5. easy_install jinja2
  6. Get, compile and install Juno

Enjoy.


Requests: 92

Comment » | articles

Webscraping with Python and BeautifulSoup

Recently my life has been a hype; partly due to my upcoming Python addiction. There’s simply no way around it; so I should better confess it in public. I’m in love with Python. It’s not only mature, businessproof and performant, but also benefits from sleekness, great performance and is just so much fun to write. It’s as if I were in Star Trek and only had to tell the computer what I wanted; never minding how the job actually it is done. Even my favourite comic artist(besides Scott Adams, of course..) took up on it; so my feelings have to be honest.

In this short tutorial, I’m going to show you how to scrape a website with the 3rd party html-parsing module BeautifulSoup in a practical example. We will search the wonderful translation engine dict.cc, which holds the key to over 700k translations from English to German and vice versa. Note that BeautifulSoup is liscensed just like Python while dict.cc allows for external searching.

First of, place BeautifulSoup.py in your modules directory. Alternatively, if you just want to do a quick test, put in the same directory where you will be writing your program. Then start your favourite text editor/Python IDE(for quick prototyping like we are about to do, I highly recommend a combination of IDLE and VIM) and begin coding. In this tutorial we won’t be doing any design; we won’t even encapsulate in a class. How to do that, later on, is up to your needs.

What we will do:

  1. go to dict.cc
  2. enter a search word into the webform
  3. submit the form
  4. read the result
  5. parse the html code
  6. save all translations
  7. print them

You can either read the needed coded on the fly or download it.
Now let’s begin the magic. Those are our needed imports.

import urllib
import urllib2
import string
import sys
from BeautifulSoup import BeautifulSoup

urllib and urllib2 are both modules offering the possibility to read data from various URLs; they will be needed to open the connection and retrieve the website.  BeautifulSoup is, as mentioned, a html parser.

Since we are going to fetch our data from a website, we have to behave like a browser. That’s why will be needing to fake a user agent. For our program, I chose to push the webstatistics a little in favour of Firefox and Solaris.

user_agent = 'Mozilla/5 (Solaris 10) Gecko'
headers = { 'User-Agent' : user_agent }

Now let’s take a look at the code of dict.cc. We need to know how the webform is constructed if we want to query it.

...
<form style="margin:0px" action="http://www.dict.cc/" method="get">
  <table>
    <tr>
      <td>
        <input id="sinp" maxlength="100" name="s" size="25" type="text" />
        style="padding:2px;width:340px" value="">
      ...</td>
    </tr>
  </table>
</form>
...

The relevant parts are action, method and the name inside the input tag. The action is the webapplication that will get called when the form is submitted. The method shows us how we need to encode the data for the form while the name is our query variable.

values = {'s' : sys.argv[1] }
data = urllib.urlencode(values)
request = urllib2.Request("http://www.dict.cc/", data, headers)
response = urllib2.urlopen(request)

Here the data get’s encapsulated in a GET request and packed into the webform. Notice that values is a dictionary which makes handling more complex forms a charm. The the form gets submitted by urlopen() - i.e. we virtually pressed the “Search”-button.
See how easy it is? These are only a couple lines of code, but we already have searched on dict.cc for a completely arbitrary word from the commandline. The response has also been retrieved. All that is left, is to extract the relevant information.

the_page = response.read()
pool = BeautifulSoup(the_page)

The response is read and saved into regular html code. This code could now be analyzed via regular string.find() or re.findall() methods, but this implies hard-coding in reference to a lot of the underlying logic of the page. Besides, it would require a lot reverse engineering of the positional parameters, setting up several potentially recursive methods. This would ultimately produce ugly(i.e. not very pythonic) code. Lucky for us, there already is a full fledged html parser which allows us to ask just about any generic question. Let’s take a look at the resulting html code, first. If you are not yet familar with the tool that can be seen in the screenshot; I’m using Firefox with the Firebug addon. This one is very helpful if you ever need to debug a website.

dict.cc // search for "web"

Let me show an excerpt of the code.

<table>..
  <td class="td7nl" style="background-color: rgb(233, 233, 233);">
    <a href="/englisch-deutsch/web.html">
      <b>web</b>
    </a>
  </td>
<td class="td7nl" ... /td>
</table>..

The results are displayed in a table. The two interesting columns share the class td7nl. The most efficient way would seem to just sweep all the data from inside the cells of these two columns. Fortunately for us, BeautifulSoup implemented just that feature.

results = pool.findAll('td', attrs={'class' : 'td7nl'})
source = ''
translations = []

for result in results:
    word = ''
    for tmp in result.findAll(text=True):
        word = word + " " + unicode(tmp).encode("utf-8")
    if source == '':
        source = word
    else:
        translations.append((source, word))

for translation in translations:
    print "%s => %s" % (translation[0], translation[1])

results will be a BeautifulSoup.ResultSet. Each member of the tuple is the html code of one column of the class td7nl. Notice that you can access each element like you would expect in a tuple. result.findAll(text=True) will return each embedded textual element of the table. All we have to do is merge the different tags together.
source and word are temporary variables that will hold one translation in each iteration. Each translation will be saved as a pair(list) inside the translations tuple.
Finally we iterate over the found translations and write them to the screen.

$ python webscraping_demo.py
 kinky   {adj} =>  9 kraus   [Haar]
 kinky   {adj} =>  nappy   {adj}   [Am.]
 kinky   {adj} =>  6 kraus   [Haar]
 kinky   {adj} =>  crinkly   {adj}
 kinky   {adj} =>  kraus
 kinky   {adj} =>  curly   {adj}
 kinky   {adj} =>  kraus
 kinky   {adj} =>  frizzily   {adv}

In a regular application those results would need a little lexing, of course. The most important thing, however, is that we just wrote a translation wrapper onto a webapplication - in only 28 lines of code. Did I mention that I’m in love with Python?

All that is left is for me to recommend the BeautifulSoup documentation. What we did here really didn’t cover what this module is capable of.

I wish you all the best.



Requests: 892

12 comments » | articles