<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
> <channel><title>Alain M. Lafon &#187; tutorial</title> <atom:link href="http://blog.dispatched.ch/tag/tutorial/feed/" rel="self" type="application/rss+xml" /><link>http://blog.dispatched.ch</link> <description>code, life and struggles thereof</description> <lastBuildDate>Mon, 16 Jan 2012 13:44:17 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <item><title>Disable Windows auto desktop lock</title><link>http://blog.dispatched.ch/2010/07/06/disable-windows-auto-desktop-lock/</link> <comments>http://blog.dispatched.ch/2010/07/06/disable-windows-auto-desktop-lock/#comments</comments> <pubDate>Tue, 06 Jul 2010 11:51:31 +0000</pubDate> <dc:creator>Alain M. Lafon</dc:creator> <category><![CDATA[personal]]></category> <category><![CDATA[productivity]]></category> <category><![CDATA[tutorial]]></category> <category><![CDATA[windows]]></category> <category><![CDATA[work]]></category> <guid
isPermaLink="false">http://blog.dispatched.ch/2010/07/06/disable-windows-auto-desktop-lock/</guid> <description><![CDATA[Let me start with: &#8220;I hate Windows from the bottom of my heart.&#8221; But there&#8217;s another level of hate. I hate pseudo business proof settings in Windows even more. One of those would be the &#8220;Auto Lock&#8221;. Anyway, if you think that your (clients) VPN &#8211; or bureau for that case &#8211; can keep up [...]]]></description> <content:encoded><![CDATA[<div
class='posterous_autopost'> Let me start with: &#8220;I hate Windows from the bottom of my heart.&#8221;<p
/> But there&#8217;s another level of hate. I hate pseudo business proof settings in Windows even more. One of those would be the &#8220;Auto Lock&#8221;.<p
/> Anyway, if you think that your (clients) VPN &#8211; or bureau for that case &#8211; can keep up its security without the dreaded auto lock &#8211; here&#8217;s what you do:<p
/> <b>1. Hate the auto lock</b><p
/> <img
src="http://posterous.com/getfile/files.posterous.com/dispatched/W3rOXgSlCtZiMNl16Gxx33w6NAedryirBXHOptJQYlu9VgIlAtKKiElLvFgH/2010-07-06_13h33_50.png" width="309" height="262"/><p
/> <b><br
/> 2. Right click your desktop</b><p
/> <img
src="http://posterous.com/getfile/files.posterous.com/dispatched/W6645yqvrTDEL6tvkyJSeldl5rY9g6kegmEy0BKlUjmNU9TaX7VfZG6EKQhQ/2010-07-06_13h30_26.png" width="177" height="239"/><p
/> <b><br
/> 3. Chose screensaver</b><p
/> <a
href='http://posterous.com/getfile/files.posterous.com/dispatched/XQbaeMXFjRKfgUp9Sv5njMWgKjeVpaxO27QAAYXbFeeZFVVMrOJ8VFLu1eBS/2010-07-06_13h30_38.png' rel="lightbox[1127]"><img
src="http://posterous.com/getfile/files.posterous.com/dispatched/3b8q8TJPpMjcbz8FVkmXb5qsNAR2qCRUi1K2KEM1XMRUpDryyg83cvP9j2oR/2010-07-06_13h30_38.png.scaled.500.jpg" width="500" height="51"/></a><p
/> <b>4. Uncheck &#8220;On resume, display logon screen&#8221;</b><p
/> <img
src="http://posterous.com/getfile/files.posterous.com/dispatched/33jwgx2fPD9A9xj0bbVCNTFfmQUbkvQF1oZC5W9dtoaDzneHkyPS1FF75gxV/moz-screenshot-13.jpg" width="468" height="503"/> <br
/> <b><br
/> 5. Be <img
src="http://posterous.com/getfile/files.posterous.com/dispatched/xNUqPV3h163A370P9j1Rqctje8zk1wFTuzDotAQ62t59Xpoid0LM62fHfkez/moz-screenshot-14.jpg" width="128" height="86"/> <br
/> </b><br
/> At least a little less unhappy. You&#8217;re still with Windows. You&#8217;re still doomed. But the apocalypse is a little farther away for now.</div> ]]></content:encoded> <wfw:commentRss>http://blog.dispatched.ch/2010/07/06/disable-windows-auto-desktop-lock/feed/</wfw:commentRss> <slash:comments>2</slash:comments> </item> <item><title>VIM to write mails in Thunderbird</title><link>http://blog.dispatched.ch/2010/06/30/vim-to-write-mails-in-thunderbird/</link> <comments>http://blog.dispatched.ch/2010/06/30/vim-to-write-mails-in-thunderbird/#comments</comments> <pubDate>Wed, 30 Jun 2010 16:12:20 +0000</pubDate> <dc:creator>Alain M. Lafon</dc:creator> <category><![CDATA[articles]]></category> <category><![CDATA[tunderbird]]></category> <category><![CDATA[tutorial]]></category> <category><![CDATA[vim]]></category> <guid
isPermaLink="false">http://blog.dispatched.ch/2010/06/30/vim-to-write-mails-in-thunderbird/</guid> <description><![CDATA[While at work, I have to use Windows. Since Windows doesn&#8217;t ship with a decent mail/calendar solution (nope, Outlook doesn&#8217;t qualiy &#8211; keywords &#8220;winmail.dat&#8221; and &#8220;ics support&#8221; should trigger your memory), I had to build a custom setup. Thunderbird is a good basis and does the job well. It&#8217;s sleek and has good IMAP support. [...]]]></description> <content:encoded><![CDATA[<div
class='posterous_autopost'> While at work, I have to use Windows. Since Windows doesn&#8217;t ship with a decent mail/calendar solution (nope, Outlook doesn&#8217;t qualiy &#8211; keywords &#8220;winmail.dat&#8221; and &#8220;ics support&#8221; should trigger your memory), I had to build a custom setup.<p
/> Thunderbird is a good basis and does the job well. It&#8217;s sleek and has good IMAP support. Combine that with plugins for <a
href="http://vcssupport.blogspot.com/">VCS support</a>, <a
href="http://www.mozilla.org/projects/calendar/lightning/">Lightning </a>for an integrated calendar and the <a
href="http://www.mozilla.org/projects/calendar/lightning/">provider for Google Calendar</a>, you find yourself with a decent toolset. What kept bugging me is editing the mails.&nbsp; Coming from mutt/VIM, I might be biased on that one. Heck, I&#8217;m even using the <a
href="http://vimperator.org/">Vimperator </a>plugin in Firefox and find that it brightens each and every day.<p
/> Anyway, there is release to that pain! There&#8217;s a plugin called &#8220;External Editor&#8221; &#8211; it works in Windows as it does in real OSs and it&#8217;s actually quite a charm as you can see. You can find all you need on <a
href="http://globs.org/download.php?lng=en">globs.org</a>. Just follow the instructions and you&#8217;ll be happy.<p
/> <a
href='http://posterous.com/getfile/files.posterous.com/dispatched/NuR1HzF9r1gvsyVSVuOdpMSUe4UoG0rcyhymt11132DFQzJJmoEwr7BqFv2z/moz-screenshot-10.jpg' rel="lightbox[1122]"><img
src="http://posterous.com/getfile/files.posterous.com/dispatched/qNrvPsKDkNvVpp3PRwuoGX2qWZwQGfdWh5k0hSrM6KHQ56ASxBsyKli7I48Y/moz-screenshot-10.jpg.scaled.500.jpg" width="500" height="401"/></a><p
/> Tips: Customize your mail view to show the feature &#8220;External Editor&#8221; or use the pre-defined shortcut CTRL+e to open your custom editor (that is VIM for me^^).<p
/> <img
src="http://posterous.com/getfile/files.posterous.com/dispatched/MEZcC7QEYQ4Ge692LLWHt7oKr1ye88nlcc0ERh3CUAjDcGWvy7PqJgimC8xm/moz-screenshot-11.jpg" width="411" height="50"/><p
/> Have fun and enjoy the sweet life(; &nbsp;</div> ]]></content:encoded> <wfw:commentRss>http://blog.dispatched.ch/2010/06/30/vim-to-write-mails-in-thunderbird/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> <item><title>VIM as Python IDE</title><link>http://blog.dispatched.ch/2009/05/24/vim-as-python-ide/</link> <comments>http://blog.dispatched.ch/2009/05/24/vim-as-python-ide/#comments</comments> <pubDate>Sat, 23 May 2009 23:04:59 +0000</pubDate> <dc:creator>Alain M. Lafon</dc:creator> <category><![CDATA[articles]]></category> <category><![CDATA[coding]]></category> <category><![CDATA[ctags]]></category> <category><![CDATA[exuberant ctags]]></category> <category><![CDATA[ide]]></category> <category><![CDATA[minibuf]]></category> <category><![CDATA[omni completion]]></category> <category><![CDATA[pep 8]]></category> <category><![CDATA[programming]]></category> <category><![CDATA[python]]></category> <category><![CDATA[python ide]]></category> <category><![CDATA[taglist]]></category> <category><![CDATA[tasklist]]></category> <category><![CDATA[tutorial]]></category> <category><![CDATA[vi]]></category> <category><![CDATA[vim]]></category> <category><![CDATA[vimpdb]]></category> <category><![CDATA[walkthrough]]></category> <guid
isPermaLink="false">http://blog.dispatched.ch/?p=777</guid> <description><![CDATA[Finding the perfect IDE for Python isn&#8217;t an easy feat. There are a great many to chose from, but even though some of them offer really nifty features, I can&#8217;t help myself but feel attracted to VIM anyway. I feel that no IDE accomplishes the task of giving the comfort of complete power over the [...]]]></description> <content:encoded><![CDATA[<p>Finding the perfect IDE for Python isn&#8217;t an easy feat. There are a great many to chose from, but even though some of them offer really nifty features, I can&#8217;t help myself but feel attracted to VIM anyway. I feel that no IDE accomplishes the task of giving the comfort of complete power over the code &#8211; something is always missing out. This is why I always come back to using IDLE and VIM. Those two seem to be best companions when doing some quick and agile hacking &#8211; but when it comes to managing bigger and longer term projects, this combo needs some tweaking. But when it&#8217;s done, VIM will be a powerful IDE for Python &#8211; including code completion(with pydoc display), graphical debugging, task-management and a project view.</p><p>This is where we are going:</p><p
style="text-align: center;"><a
href="http://blog.dispatched.ch/wp-content/uploads/2009/05/vim-as-python-ide.png" rel="lightbox[777]"><img
class="size-full wp-image-799 aligncenter" title="vim-as-python-ide" src="http://blog.dispatched.ch/wp-content/uploads/2009/05/vim-as-python-ide.png" alt="VIM as Python IDE" width="491" height="401" /></a></p><p>So, these are my thoughts on a VIM setup for coding (Python).</p><p>Modern GUI VIM implementations like GVIM or MacVIM give the user the opportunity to organize their open files in tabs. This might look convenient, but to me it is rather bad practice, because a second tab will not be in the in the same buffer scope as the first one which takes away from future interaction options between the two. Using <a
title="MiniBuf" href="http://www.vim.org/scripts/script.php?script_id=159" target="_blank">MiniBufExplorer</a>, however, gives the user tabs(not only in the GUI, but also in command line) and leaves the classic buffer interaction intact.</p><p
style="text-align: center;"><a
href="http://blog.dispatched.ch/wp-content/uploads/2009/05/minibuf.png" rel="lightbox[777]"><img
class="size-full wp-image-784 aligncenter" title="minibuf" src="http://blog.dispatched.ch/wp-content/uploads/2009/05/minibuf.png" alt="MiniBuf Explorer" width="484" height="87" /></a></p><p>Being able to neatly work on multiple files, the user still misses the potential his favourite IDE gives him in visualizing classes, functions and variables. Luckily there are quite a few plugins around to accomplish this task just as well. My favourite one would be <a
title="TagList" href="http://vim-taglist.sourceforge.net/" target="_blank">TagList</a>. TagList uses <a
title="Exuberant CTags" href="http://ctags.sourceforge.net/" target="_blank">Exuberant Ctags</a> for actually generating the tags(note: it really relies on this specific version of ctags &#8211; preinstalled implementations on UNIX systems won&#8217;t work).</p><p
style="text-align: center;"><a
href="http://blog.dispatched.ch/wp-content/uploads/2009/05/taglist.png" rel="lightbox[777]"><img
class="size-full wp-image-787 aligncenter" title="taglist" src="http://blog.dispatched.ch/wp-content/uploads/2009/05/taglist.png" alt="TagList" width="481" height="260" /></a></p><p>A lot of coders have the habit of using TODO or FIXME statements in their code. Other IDEs often rely on having good third party project management software, but not VIM. There are great plugins like <a
title="TaskList" href="http://www.vim.org/scripts/script.php?script_id=2607" target="_blank">Tasklist</a> reminding the programmer of those lines of code. Tasklist even implements custom lists &#8211; to me that&#8217;s an incredible productivity gain.</p><p
style="text-align: center;"><a
href="http://blog.dispatched.ch/wp-content/uploads/2009/05/tasklist.png" rel="lightbox[777]"><img
class="size-full wp-image-781 aligncenter" title="tasklist" src="http://blog.dispatched.ch/wp-content/uploads/2009/05/tasklist.png" alt="TaskList" width="491" height="163" /></a></p><p>In these times, the programmer knows his or her programming language more or less by interactively finding out what it can do. Therefore code completion(sometimes also called IntelliSense*ugh*) is a major feature. I have heard  many people saying that this is where VIM fails &#8211; but luckily they are plain wrong(; In V7, VIM introduced <a
title="Omni Completion" href="http://vim.wikia.com/wiki/Omni_completion" target="_blank">omni completion</a> &#8211; given it is configured to recognize Python (if not, this feature is only a <a
title="Python Omni Completion" href="http://www.vim.org/scripts/script.php?script_id=1542" target="_blank">plugin</a> away) Ctrl+x Ctrl+o opens a drop down dialog like any other IDE &#8211; even the whole Pydoc gets to be displayed in a split window.</p><p
style="text-align: center;"><a
href="http://blog.dispatched.ch/wp-content/uploads/2009/05/omnicompletion.png" rel="lightbox[777]"><img
class="size-full wp-image-791 aligncenter" title="omnicompletion" src="http://blog.dispatched.ch/wp-content/uploads/2009/05/omnicompletion.png" alt="Omni Completion" width="436" height="312" /></a></p><p>Probably the most wanted feature(besides code completion) is debugging graphically. <a
title="VimPDB" href="http://code.google.com/p/vimpdb/" target="_blank">VimPDB</a> is a plugin that lets you do just that(. I acknowledge it is no complete substitution for a full fledged graphical debugger, but I honour the thought that having to rely on a debugger (often), is a hint of bad design.</p><p
style="text-align: center;"><a
href="http://blog.dispatched.ch/wp-content/uploads/2009/05/vimpdb.png" rel="lightbox[777]"><img
class="size-full wp-image-794 aligncenter" title="vimpdb" src="http://blog.dispatched.ch/wp-content/uploads/2009/05/vimpdb.png" alt="VimPDB" width="498" height="134" /></a></p><p>&#8211;</p><p
style="text-align: center;"><p
style="text-align: left;">From the eye-candy to the implementation. Don&#8217;t worry, it&#8217;s no sorcery.</p><p
style="text-align: left;">First of all, make sure you have VIM version 7.x installed, compiled with Python support. To check for the second, enter <em>:python print &#8220;hello, world&#8221;</em> into VIM. If you see an error message like <em>&#8220;E319: Sorry, the command is not available in this version&#8221;</em>, then it&#8217;s time to get a new one. If you&#8217;re on a Mac, just install MacVIM(there&#8217;s also a binary for the console in /Applications/MacVim.app/Contents/MacOS/). If you&#8217;re on Windows, GVIM will suffice(for versions != 2.4 search for the right <a
title="Vim for Windows32" href="http://www.gooli.org/blog/gvim-72-with-python-2526-support-windows-binaries/" target="_blank">plugin</a>). If you&#8217;re on any other machine, you will probably know how to compile your very own VIM with Python support.</p><p
style="text-align: left;">Second, check if you have a plugin directory. In Unix it would typically be located in <em>$HOME/.vim/plugin</em>, in Windows in the <em>Program Files </em>directory. If it doesn&#8217;t exist, create it.</p><p
style="text-align: left;">Now, let&#8217;s start with the MiniBufExplorer. <a
title="MiniBuf Explorer" href="http://www.vim.org/scripts/script.php?script_id=159" target="_blank">Get</a> it and copy it into your plugin directory. To start it automatically when needed and be able to use it with keyboard and mouse commands, append these lines in your vimrc configuration:</p><p><code>let g:miniBufExplMapWindowNavVim = 1<br
/> let g:miniBufExplMapWindowNavArrows = 1<br
/> let g:miniBufExplMapCTabSwitchBufs = 1<br
/> let g:miniBufExplModSelTarget = 1</code></p><p
style="text-align: left;">For a project view, get <a
title="TagList" href="http://vim-taglist.sourceforge.net/" target="_blank">TagList</a> and <a
title="Exuberant CTags" href="http://ctags.sourceforge.net/" target="_blank">Exuberant Ctags</a>. To install Ctags, unpack it, go into the directory and do a compile/install via:</p><p><code>./configure &amp;&amp; sudo make install</code></p><p>Ctags will then be installed in /usr/local/bin. When using a Windows machine, I recommend <a
href="http://cygwin.com/">Cygwin</a> with GCC and Make; it&#8217;ll work just fine. If you don&#8217;t want to tamper with your original ctags installation, you can propagate the location to VIM by appending the following line to vimrc:</p><p><code>let Tlist_Ctags_Cmd='/usr/local/bin/ctags'</code></p><p>To install TagList, just drop it into VIMs plugin directory. You will now be able to use the project view by typing the command <em>:TlistToggle</em>.</p><p><a
title="TaskList" href="http://www.vim.org/scripts/script.php?script_id=2607" target="_blank">Tasklist</a> is a simple plugin, too. Copying it into the plugin directory will suffice. I like to have shortcuts and have added<br
/> <code><br
/> map T :TaskList&lt;CR&gt;<br
/> map P :TlistToggle&lt;CR&gt;<br
/> </code></p><p>to vimrc. Pressing <em>T </em>will then open the TaskList if there are any tasks to process. <em>q </em>quits the TaskList again.</p><p><a
title="VimPDB" href="http://code.google.com/p/vimpdb/" target="_blank">VimPDB</a> is a plugin, as well. Install as before and see the readme for documentation. If it doesn&#8217;t work out of the box, watch for the known <a
title="Issuses VimPDB" href="http://code.google.com/p/vimpdb/issues/list" target="_blank">issues</a>.</p><p>To enable code(omni) completion, add this line to your vimrc:</p><p><code>autocmd FileType python set omnifunc=pythoncomplete#Complete</code></p><p>If it doesn&#8217;t work then, you&#8217;ll need this <a
title="Python Omni Completion" href="http://www.vim.org/scripts/script.php?script_id=1542" target="_blank">plugin</a>.</p><p
style="text-align: left;">My last two recommondations are setting these lines to comply to <a
title="PEP 8" href="http://www.python.org/dev/peps/pep-0008/" target="_blank">PEP 8</a>(Pythons&#8217; style guide) and to have decent eye candy:</p><p><code>set expandtab<br
/> set textwidth=79<br
/> set tabstop=8<br
/> set softtabstop=4<br
/> set shiftwidth=4<br
/> set autoindent<br
/> :syntax on</code></p><p>There are certainly a lot more flags to help productivity, but those will probably be more user specific.</p><p>Have fun coding Python while not being bound to a specific IDE, but having all the benefits of VIM bundled with a few helping hands. Enjoy, everyone.</p><p>If you liked this article, please feel free to re-tweet it and let others know.</p><table
border="0"><tbody><tr><td><script type="text/javascript">tweetmeme_url = 'http://blog.dispatched.ch/2009/05/24/vim-as-python-ide/';</script><br
/> <script src="http://tweetmeme.com/i/scripts/button.js" type="text/javascript"></script></td><td>&nbsp;&nbsp;&nbsp;</td><td> You should follow me on twitter <a
href="http://twitter.com/preek">here</a><br
/> <a
href="http://twitter.com/preek"><img
class="alignnone" style="border: 0pt none;" title="twitter_preek" src="http://blog.dispatched.ch/wp-content/uploads/2009/05/twitter_preek.gif" border="0" alt="twitter_preek" width="180" height="18" /></a></td></tr></tbody></table> ]]></content:encoded> <wfw:commentRss>http://blog.dispatched.ch/2009/05/24/vim-as-python-ide/feed/</wfw:commentRss> <slash:comments>99</slash:comments> </item> <item><title>Juno on Solaris 10</title><link>http://blog.dispatched.ch/2009/05/18/juno-on-solaris-10/</link> <comments>http://blog.dispatched.ch/2009/05/18/juno-on-solaris-10/#comments</comments> <pubDate>Mon, 18 May 2009 13:23:30 +0000</pubDate> <dc:creator>Alain M. Lafon</dc:creator> <category><![CDATA[articles]]></category> <category><![CDATA[Compile Python]]></category> <category><![CDATA[howto]]></category> <category><![CDATA[Juno]]></category> <category><![CDATA[lightweight]]></category> <category><![CDATA[python]]></category> <category><![CDATA[Solaris 10]]></category> <category><![CDATA[tutorial]]></category> <category><![CDATA[webframework]]></category> <guid
isPermaLink="false">http://blog.dispatched.ch/?p=753</guid> <description><![CDATA[Juno is an incredibly lightweight webframework. Using Python as backend, it fullfills my very need for just about every small application I want to deploy against the web. It has no need for big runtimes on the server, no files to configure a great many files and most importantly: there&#8217;s no coding overhead &#8211; the [...]]]></description> <content:encoded><![CDATA[<p><a
href="http://brianreily.com/project/juno" class="broken_link">Juno</a> is an incredibly lightweight webframework. Using Python as backend, it fullfills my very need for just about every small application I want to deploy against the web. It has no need for big runtimes on the server, no files to configure a great many files and most importantly: there&#8217;s no coding overhead &#8211; the programmer defines only the distinctively wanted features.<br
/> However, installing Juno on Solaris 10 isn&#8217;t quite as easy as described in Junos&#8217; documentation. Solaris ships with Python 2.4, but Juno depends in Jinja2(a templating engine) which itself depends on Python 2.5+. Even installing Blastwave&#8217;s or Sunfreeware&#8217;s version won&#8217;t help. But that&#8217;s no biggie since compiling your own Python is incredibly easy.</p><ol><li>Get, compile and install Python (I have used version 2.5.4)<ul><li><a
href="http://www.python.org/download/releases/" target="_blank">http://www.python.org/download/releases/</a></li><li>unpack</li><li>make sure you have a recent version of GCC installed</li><li>./configure &amp;&amp; make &amp;&amp; make install</li><li>as a result Python will be installed in /usr/local</li></ul></li><p></p><li>Get, compile and install Setuptools<ul><li><a
href="http://pypi.python.org/pypi/setuptools" target="_self">http://pypi.python.org/pypi/setuptools</a></li><li>unpack</li><li>python setup.py install</li></ul><p></li><li> Get, compile and install  pysqlite<ul><li><a
href="http://oss.itsystementwicklung.de/trac/pysqlite/wiki/WikiStart#Downloads" target="_blank">http://oss.itsystementwicklung.de/trac/pysqlite/wiki/WikiStart#Downloads</a></li><li>unpack</li><li>add line &#8220;library_dirs=/usr/local/lib&#8221; to pysqlite-x.y.z/setup.cfg</li><li>globally export your library paths:<li>LD_LIBRARY_PATH=/opt/csw/lib/:/usr/lib/:/lib/:/usr/local/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH</li><li>python setup.py install</li></ul></li><li>easy_install install sqlalchemy</li><p></p><li>easy_install jinja2</li><p></p><li>Get, compile and install Juno<ul><li><a
href="http://brianreily.com/project/juno" target="_blank" class="broken_link"> http://brianreily.com/project/juno</a></li><li>python setup.py install</li></ul><p></li></ol><p>Enjoy.</p> ]]></content:encoded> <wfw:commentRss>http://blog.dispatched.ch/2009/05/18/juno-on-solaris-10/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Webscraping with Python and BeautifulSoup</title><link>http://blog.dispatched.ch/2009/03/15/webscraping-with-python-and-beautifulsoup/</link> <comments>http://blog.dispatched.ch/2009/03/15/webscraping-with-python-and-beautifulsoup/#comments</comments> <pubDate>Sun, 15 Mar 2009 10:05:08 +0000</pubDate> <dc:creator>Alain M. Lafon</dc:creator> <category><![CDATA[articles]]></category> <category><![CDATA[beautifulsoup]]></category> <category><![CDATA[howto]]></category> <category><![CDATA[python]]></category> <category><![CDATA[scraping]]></category> <category><![CDATA[tutorial]]></category> <category><![CDATA[web scraping]]></category> <category><![CDATA[webscraping]]></category> <guid
isPermaLink="false">http://gefechtsdienst.de/?p=567</guid> <description><![CDATA[Recently my life has been a hype; partly due to my upcoming Python addiction. There&#8217;s simply no way around it; so I should better confess it in public. I&#8217;m in love with Python. It&#8217;s not only mature, businessproof and performant, but also benefits from sleekness, great performance and is just so much fun to write. [...]]]></description> <content:encoded><![CDATA[<p>Recently my life has been a hype; partly due to my upcoming Python addiction. There&#8217;s simply no way around it; so I should better confess it in public. I&#8217;m in love with Python. It&#8217;s not only mature, businessproof and performant, but also benefits from sleekness, great performance and is just so much fun to write. It&#8217;s as if I were in Star Trek and only had to tell the computer what I wanted; never minding how the job actually it is done. Even my favourite comic artist(besides Scott Adams, of course..) <a
href="http://xkcd.com/353/" target="_blank">took up</a> on it; so my feelings have to be honest.</p><p>In this short tutorial, I&#8217;m going to show you how to scrape a website with the 3rd party html-parsing module <a
href="http://www.crummy.com/software/BeautifulSoup/" target="_blank">BeautifulSoup</a> in a practical example. We will search the wonderful translation engine <a
href="http://www.dict.cc/" target="_blank">dict.cc</a>, which holds the key to over 700k translations from English to German and vice versa. Note that BeautifulSoup is <a
href="http://www.crummy.com/software/BeautifulSoup/#Download" target="_blank">liscensed</a> just like Python while dict.cc allows for <a
href="http://www.dict.cc/?s=about%3Afaq#faq15" target="_blank">external searching</a>.</p><p>First of, place BeautifulSoup.py in your modules directory. Alternatively, if you just want to do a quick test, put in the same directory where you will be writing your program. Then start your favourite text editor/Python IDE(for quick prototyping like we are about to do, I highly recommend a combination of IDLE and VIM) and begin coding. In this tutorial we won&#8217;t be doing any design; we won&#8217;t even encapsulate in a class. How to do that, later on, is up to your needs.</p><p>What we will do:</p><ol><li>go to dict.cc</li><li>enter a search word into the webform</li><li>submit the form</li><li>read the result</li><li>parse the html code</li><li>save all translations</li><li>print them</li></ol><p>You can either read the needed coded on the fly or <a
href='http://blog.dispatched.ch/wp-content/uploads/2009/03/webscraping_demo.py'>download </a>it.<br
/> Now let&#8217;s begin the magic. Those are our needed imports.</p><pre class="brush: python; title: ; notranslate">
import urllib
import urllib2
import string
import sys
from BeautifulSoup import BeautifulSoup
</pre><p><a
href="http://docs.python.org/library/urllib.html" target="_blank">urllib</a> and <a
href="http://docs.python.org/library/urllib2.html" target="_blank">urllib2</a> are both modules offering the possibility to read data from various URLs; they will be needed to open the connection and retrieve the website.  BeautifulSoup is, as mentioned, a html parser.</p><p>Since we are going to fetch our data from a website, we have to behave like a browser. That&#8217;s why will be needing to fake a <a
href="http://de.wikipedia.org/wiki/User_Agent" target="_blank">user agent</a>. For our program, I chose to push the webstatistics a little in favour of Firefox and Solaris.</p><pre class="brush: python; title: ; notranslate">
user_agent = 'Mozilla/5 (Solaris 10) Gecko'
headers = { 'User-Agent' : user_agent }
</pre><p>Now let&#8217;s take a look at the code of dict.cc. We need to know how the webform is constructed if we want to query it.</p><pre class="brush: xml; title: ; notranslate">
...
&lt;form style=&quot;margin:0px&quot; action=&quot;http://www.dict.cc/&quot; method=&quot;get&quot;&gt;
  &lt;table&gt;
    &lt;tr&gt;
      &lt;td&gt;
        &lt;input id=&quot;sinp&quot; maxlength=&quot;100&quot; name=&quot;s&quot; size=&quot;25&quot; type=&quot;text&quot; /&gt;
        style=&quot;padding:2px;width:340px&quot; value=&quot;&quot;&gt;
      ...&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/table&gt;
&lt;/form&gt;
...
</pre><p>The relevant parts are <em>action</em>, <em>method</em> and the <em>name</em> inside the <em>input</em> tag. The action is the webapplication that will get called when the form is submitted. The method shows us how we need to encode the data for the form while the <em>name</em> is our query variable.</p><pre class="brush: python; title: ; notranslate">
values = {'s' : sys.argv[1] }
data = urllib.urlencode(values)
request = urllib2.Request(&quot;http://www.dict.cc/&quot;, data, headers)
response = urllib2.urlopen(request)
</pre><p>Here the data get&#8217;s encapsulated in a <a
href="http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol" target="_blank">GET request</a> and packed into the webform. Notice that <em>values</em> is a dictionary which makes handling more complex forms a charm. The the form gets submitted by urlopen() &#8211; i.e. we virtually pressed the &#8220;Search&#8221;-button.<br
/> See how easy it is? These are only a couple lines of code, but we already have searched on dict.cc for a completely arbitrary word from the commandline. The <em>response</em> has also been retrieved. All that is left, is to extract the relevant information.</p><pre class="brush: python; title: ; notranslate">
the_page = response.read()
pool = BeautifulSoup(the_page)
</pre><p>The <em>response</em> is read and saved into regular html code. This code could now be analyzed via regular string.find() or re.findall() methods, but this implies hard-coding in reference to a lot of the underlying logic of the page. Besides, it would require a lot reverse engineering of the positional parameters, setting up several potentially recursive methods. This would ultimately produce ugly(i.e. not very pythonic) code. Lucky for us, there already is a full fledged html parser which allows us to ask just about any generic question. Let&#8217;s take a look at the resulting html code, first. If you are not yet familar with the tool that can be seen in the screenshot; I&#8217;m using Firefox with the <a
href="https://addons.mozilla.org/de/firefox/addon/1843" target="_blank">Firebug</a> addon. This one is very helpful if you ever need to debug a website.</p><dl
id="attachment_606" class="wp-caption aligncenter" style="width: 449px;"><dt
class="wp-caption-dt"><a
href="http://blog.dispatched.ch/wp-content/uploads/2009/03/picture-2.png" rel="lightbox[567]"><img
class="size-full wp-image-606" title="dict_cc_search_for_web" src="http://blog.dispatched.ch/wp-content/uploads/2009/03/picture-2.png" alt="dict.cc // search for &quot;web&quot;" width="439" height="334" /></a></dt></dl><p>Let me show an excerpt of the code.</p><pre class="brush: xml; title: ; notranslate">
&lt;table&gt;..
  &lt;td class=&quot;td7nl&quot; style=&quot;background-color: rgb(233, 233, 233);&quot;&gt;
    &lt;a href=&quot;/englisch-deutsch/web.html&quot;&gt;
      &lt;b&gt;web&lt;/b&gt;
    &lt;/a&gt;
  &lt;/td&gt;
&lt;td class=&quot;td7nl&quot; ... /td&gt;
&lt;/table&gt;..
</pre><p>The results are displayed in a table. The two interesting columns share the class <em>td7nl</em>. The most efficient way would seem to just sweep all the data from inside the cells of these two columns. Fortunately for us, BeautifulSoup implemented just that feature.</p><pre class="brush: python; title: ; notranslate">
results = pool.findAll('td', attrs={'class' : 'td7nl'})
source = ''
translations = []
for result in results:
    word = ''
    for tmp in result.findAll(text=True):
        word = word + &quot; &quot; + unicode(tmp).encode(&quot;utf-8&quot;)
    if source == '':
        source = word
    else:
        translations.append((source, word))
for translation in translations:
    print &quot;%s =&gt; %s&quot; % (translation[0], translation[1])
</pre><p><em>results</em> will be a BeautifulSoup.ResultSet. Each member of the tuple is the html code of one column of the class <em>td7nl</em>. Notice that you can access each element like you would expect in a tuple. <em>result.findAll(text=True)</em> will return each embedded textual element of the table. All we have to do is merge the different tags together.<br
/> <em>source</em> and <em>word</em> are temporary variables that will hold one translation in each iteration. Each translation will be saved as a pair(list) inside the <em>translations</em> tuple.<br
/> Finally we iterate over the found translations and write them to the screen.</p><pre class="box">
$ python webscraping_demo.py
 kinky   {adj} =>  9 kraus   [Haar]
 kinky   {adj} =>  nappy   {adj}   [Am.]
 kinky   {adj} =>  6 kraus   [Haar]
 kinky   {adj} =>  crinkly   {adj}
 kinky   {adj} =>  kraus
 kinky   {adj} =>  curly   {adj}
 kinky   {adj} =>  kraus
 kinky   {adj} =>  frizzily   {adv}
</pre><p>In a regular application those results would need a little lexing, of course. The most important thing, however, is that we just wrote a translation wrapper onto a webapplication &#8211; in only 28 lines of code. Did I mention that I&#8217;m in love with Python?</p><p>All that is left is for me to recommend the <a
href="http://www.crummy.com/software/BeautifulSoup/documentation.html">BeautifulSoup documentation</a>. What we did here really didn&#8217;t cover what this module is capable of.</p><p>I wish you all the best.</p><p><script type="text/javascript">digg_url = 'http://digg.com/programming/Webscraping_with_Python_and_BeautifulSoup';</script><br
/> <script src="http://digg.com/api/diggthis.js"></script></p> ]]></content:encoded> <wfw:commentRss>http://blog.dispatched.ch/2009/03/15/webscraping-with-python-and-beautifulsoup/feed/</wfw:commentRss> <slash:comments>19</slash:comments> </item> </channel> </rss>

<!-- W3 Total Cache: Minify debug info:
Engine:             disk: basic
Theme:              44184
Template:           tag
-->
<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Served from: blog.dispatched.ch @ 2012-02-10 15:18:46 -->

<!-- W3 Total Cache: Page cache debug info:
Engine:             disk: basic
Cache key:          w3tc_blog.dispatched.ch_1_page_08ff1c9ef9554cfb950c7fa205d4114d_gzip
Caching:            enabled
Status:             not cached
Creation Time:      0.486s
Header info:
ETag:               "274915380f43058ea86f113a32d41d8b"
Last-Modified:      Mon, 16 Jan 2012 13:44:17 GMT
Vary:               Accept-Encoding, Cookie
X-Powered-By:       W3 Total Cache/0.9.2.4
Content-Encoding:   gzip
X-Pingback:         http://blog.dispatched.ch/xmlrpc.php
Content-Type:       text/xml; charset=UTF-8
-->
