Google AppEngine(GAE) is great for hosting Python (or Java) Web-Applications. They offer 1.3mio hits/d and 1GB up- and downstream/d for free. Considering that you will get access to Google infrastructure that let’s you crawl the web as fast as Google does itself, choosing GAE is a no-brainer for applications doing a lot of web-crawling, screen scraping or web-indexing. You can even do cron-jobs to get your job done periodically.
I won’t elaborate on how to get an account, download the SDK and get started, because Google hosts great tutorials for these itself. If you are already familiar with Python web development this will get you started in a matter of minutes.
I personally chose not to use the Google webapp framework, because I’m quite familiar with CherryPy. I fell in love with it, because it feels very sleek – very Zen-like. This comes to no surprise, because it was a deliberate design decision as can be read in The Zen of CherryPy.
Getting started with CherryPy on GAE is no trouble, either. GAE supports any Python framework that is WSGI-compliant. Those include Django, CherryPy, web.py and Pylons. Google doesn’t host these frameworks themselves, so all you have to do is copy the whole framework into your GAE project to get the import to work. That’s it. Same counts for any 3rd party module. Need BeautifulSoup? Just copy the py-file to your project. Easy as cake.
Now, if you want to serve images dynamically, you don’t have to store them on harddisk to link to them. Just save them in the Google Datastore and serve whenever needed.
Using the following snippet you will be able to dynamically serve images with URLs like this:
import cherrypy from cherrypy import expose import wsgiref.handlers import DynamicImage class Root: @expose def index(self): return "" class GetImage(): """ GetImage provides a handler for dynamic images """ def __init__(self): """ Mockup for getting some images. Datastore or live scraping could be done here """ # Note: DynamicImage is just a mockup. # There is no such module. dynamic_image = DynamicImage.DynamicImage() self.pictures = dynamic_image.getImages() @expose def index(self, num=None): """ Provides the handler for urls: application/handler_name/index/[0-9]* """ return self._convert_to_image(self.pictures[int(num)]) def _convert_to_image(self, picture): cherrypy.response.headers['Content-Type'] = "image/jpg" return picture # Root() doesn't do anything here. It normally serves your index page. root = Root() # Generate route http://app/img/ root.img = GetImage() # Start CherryPy app in wsgi mode app = cherrypy.tree.mount(root, "/") wsgiref.handlers.CGIHandler().run(app)
One last note: Processes running longer than 15-30s will be cut off from GAE with the DeadlineExceededError exception. You can catch this exception and try to divide your workload into smaller pieces.