Monday, 23 May 2011

Image annotation with libvips and Python

update: There's a newer post on this subject with some updated timings and code.

I had a mail asking how to replace a bit of ImageMagick with libvips and Python. Here it is, wrapped up as a blog post.

First, here's the ImageMagick:
#!/bin/sh

convert $1 \
    -background Red -density 300 \
    -font /usr/share/fonts/truetype/msttcorefonts/Arial.ttf \
    -pointsize 12 -gravity south -splice 0x150 \
    -gravity southwest -annotate +50+50 "left corner" \
    -gravity southeast -annotate +50+50 'right corner' \
    +repage \
    $2
This adds a 150-pixel-high red banner to the bottom of an image, for example:
And here's the same thing in Python using libvips. It is rather fiddly, unfortunately:
#!/usr/bin/python

import sys
from vipsCC import *

im = VImage.VImage(sys.argv[1])

zero = VImage.VImage.black(im.Xsize(), 150, 3)
Next we make a red image from the black (zero) image. im.lin(a, b) calculates (im * a + b), that is, it does a linear transform (hence the name). You can pass a number for a and b, or a list. If you pass a list, one list element is used for each image band. So therefore this adds 255 to the first band. The result of lin() is always float, so you need to cast back to 8-bit afterwards.
red = zero.lin([1, 1, 1], [255, 0, 0]).clip2fmt(VImage.VImage.FMTUCHAR)
text(string, font, width, slignment, dpi) renders text with Pango. The text is rendered to fit to a maximum width (-1 meaning no width limit). The image is always one band, 8-bit, with 0 meaning background, 255 meaning text, and intermediate values being used for anti-aliasing.
txt = VImage.VImage.text("left corner", "sans 12", -1, 0, 300)
im.embed(type, left, top, width, height) 'embeds' im within a larger image of size (width, height) at position (left, top). The 'type' field sets how the new background pixels are created: 0 means 'fill with zero'.
txt = txt.embed(0, 50, 50, im.Xsize(), 150)

txt2 = VImage.VImage.text("right corner", "sans 12", -1, 0, 300)
txt2 = txt2.embed(0, im.Xsize() - txt2.Xsize() - 50, 50, im.Xsize(), 150)

txt = txt.orimage(txt2)
condition.blend(a, b) uses the condition image to blend between images a and b. Here we are using the text mask to blend between the red background and the black foreground.
txt = txt.blend(zero, red)
Finally, big.insert(small, left, top) pastes image small into image big at position (left, top), expanding the image as necessary.
im = im.insert(txt, 0, im.Ysize())

im.write(sys.argv[2])
And we're done.

On the plus side, because it's done with a 'real' programming language, you have a lot more flexibility in the way you do the layout. You could write a set of little functions to do the layout for you, and perhaps do a better job of centering the text.

Benchmarking with a 5,000 by 5,000 pixel RGB image on my desktop machine, I get:
$ time  ./try70.sh wtc_small.tif test2.tif
real    0m1.179s
user    0m0.880s
sys     0m0.650s
peak RSS 400m
$ time  ./try71.py wtc_small.tif test2.tif
real    0m0.549s
user    0m0.290s
sys     0m0.330s
peak RSS 25m

So libvips is about twice as fast in this case and needs a lot less memory. Repeating with a 10,000 x 10,000 pixel RGB TIFF image, I get:
$ time ./try70.sh wtc.tif wtc2.tif
real    0m4.050s
user    0m2.770s
sys     0m2.200s
peak RSS 1.3g
$ time ./try71.py wtc.tif wtc2.tif
real    0m3.206s
user    0m0.970s
sys     0m1.530s
peak RSS 25m

So vips needs roughly constant memory regardless of image size. The speed gain becomes smaller as the images become larger, since less time is spent processing and more time is spent reading and writing tiff.

No comments:

Post a Comment