Saturday, 21 November 2015

Fancy transforms

The development libvips, 8.2, has just gained a nice new feature: a fancy transformation operation.

The idea is that you make an index image where each pixel is a pair of numbers. These numbers are the coordinates of a point in another image. The transform operation, vips_mapim(), takes the index image and a source image and generates the output by looking up every point in the index in the input. This is a pretty standard feature of image processing libraries and libvips should have had it years ago, ah well.

Here's a simple Python example to show this in action. It uses vips_xyz() to make an image where each pixel is its own coordinate, then uses pixel arithmetic to displace pixels by sin(distance) / distance. When you apply this index image to a photo, it gives a wobble effect that fades to the edges of the image.

#!/usr/bin/python

import sys

import gi
gi.require_version('Vips', '8.0')
from gi.repository import Vips

def wobble(image):
    # this makes an image where pixel (0, 0) (at the top-left) has value [0, 0],
    # and pixel (image.width, image.height) at the bottom-right has value
    # [image.width, image.height]
    index = Vips.Image.xyz(image.width, image.height)

    # make a version with (0, 0) at the centre, negative values up and left,
    # positive down and right
    centre = index - [image.width / 2, image.height / 2]

    # to polar space, so each pixel is now distance and angle in degrees
    polar = centre.polar()

    # scale sin(distance) by 1/distance to make a wavey pattern
    d = 10000 * (polar[0] * 3).sin() / (1 + polar[0])

    # and back to rectangular coordinates again to make a set of vectors we can
    # apply to the original index image
    index += d.ibandjoin(polar[1]).rect()

    # finally, use our modified index image to distort the input!
    return image.mapim(index)

image = Vips.Image.new_from_file(sys.argv[1])
image = wobble(image)
image.write_to_file(sys.argv[2])


Here's what the Bilbao Guggenheim looks like through this program:


This is doing the complete calculation for every pixel, but often that level of accuracy is not really necessary. We could generate a quarter-resolution index and scale it up, then we'd only need to do 1/16th of the maths.

This is a simple change to make to the program. We swap the xyz line for:

    index = 4 * Vips.Image.xyz(image.width / 4, image.height / 4)

that is, generate 1/4 of the image, but scale values up by 4. And we change the way we apply the index:

    return image.mapim(index.similarity(scale = 4))

Now we enlarge the index to full size before doing the lookup.

It won't make much difference on a small image like this, but on a larger file it starts to become noticeable. With a 10,000 by 10,000 pixel jpeg I see:

$ time ./wobble.py ~/pics/wtc.jpg x.jpg
memory: high-water mark 56.43 MB
real    0m7.644s
user    0m14.320s
sys    0m0.517s
$ time ./wobble4.py ~/pics/wtc.jpg x.jpg
memory: high-water mark 45.24 MB
real    0m5.205s
user    0m7.676s
sys    0m0.413s

A nice improvement. For comparison, imagemagick's swirl operation, which should be roughly comparable, takes about 3x longer and needs a lot more memory:

$ time convert ~/pics/wtc.jpg -swirl 1080 x.jpg
peak memory: 1.5 GB
real    0m16.746s
user    0m16.079s
sys    0m0.621s

nip2 is nice for playing about with this stuff, you can watch the index image change as you adjust equations. Here's nip2 doing the same thing, with some sliders to adjust the constants:


Workspace plus sample image here.

You can use vips_mapim() for useful things too. Here are a pair of functions which map images to and from polar coordinate space:

def to_polar(image):
    xy = Vips.Image.xyz(image.width, image.height)
    xy -= [image.width / 2.0, image.height / 2.0]
    scale = min(image.width, image.height) / float(image.width)
    xy *= 2.0 / scale
    index = xy.polar()
    index *= [1, image.height / 360.0]
    return image.mapim(index)

def to_rectangular(image):
    xy = Vips.Image.xyz(image.width, image.height)
    xy *= [1, 360.0 / image.height]
    index = xy.rect()
    scale = min(image.width, image.height) / float(image.width)
    index *= scale / 2.0
    index += [image.width / 2.0, image.height / 2.0]
    return image.mapim(index)

Converting to polar wraps an image around a vertical axis positioned at the origin. With the Guggenheim image you get this strange thing:


This has the nice property that vertical lines in the input become circles (or segments of circles) in polar space, and horizontal lines become radial spokes.

Here's an example of using this transform taken from this stackoverflow answer. This image is a scanned page from a book, but it's annoyingly not quite straight:

If we take the FFT of this, you can see the angle of the page as an angled line in the transform, caused by the recurring shapes of characters:

Now, if we turn this to rectangular coordinates, we'll see radial lines as horizontals:


Now just sum each row to get a set of peaks:


And the position of the largest peak is the angle the image is rotated by. Applying that rotation to the original gives:


which looks pretty straight. There's a nip2 workspace plus a sample image if you'd like to try it out. You'll need git master libvips and nip2.

11 comments:

  1. Very interesting tool, I see a lot of applications for it... good job!

    Talking about transforms, I can offer a perspective correction tool that is already adapted to chunk-based processing, and therefore could be implemented as a VIPS operation quite easily. It works by taking an input image and the coordinates of a quadrilateral shape, and outputs an image in which the quadrilateral region is transformed into a regular rectangle.

    If that is of some interest for you, I can try to write a first version of the VIPS operation...

    ReplyDelete
    Replies
    1. That sounds interesting! vips already has a thing that almost does that:

      https://github.com/jcupitt/libvips/blob/master/libvips/resample/quadratic.c

      It's broken unfortunately for second order transforms, but it works fine for first-order perspective warps. It's extremely quick, though it doesn't thread, sadly. The way it finds the differential of the transform might be useful?

      Delete
    2. There might be indeed some overlap between the two implementations... mine is here:

      https://github.com/aferrero2707/PhotoFlow/blob/master/src/vips/perspective.cc
      https://github.com/aferrero2707/PhotoFlow/blob/master/src/dt/iop/clipping.cc

      I've been reusing code from darktable, in particular the part that computes the matrix starting from the keystones (the corners of the quadrangular region), and the one that uses the matrix to compute the direct and inverse transforms.

      To be honest, I've not yet entirely understood the code I've derived from DT... and I have no idea if the matrix formalism is the same in the two cases.
      However, the use of keystones is very handy when it comes to GUI applications, because one can easily set and drag the keystones with the mouse, until the deformed rectangle matches the perspective in the image (see http://photoflowblog.blogspot.fr/2015/10/new-photoflow-version-023-released.html).

      A bit off-topic: when you say that your code is not threaded, it that due to the call to

      vips_image_copy_memory()?

      I suspect I should introduce the same in my code, but then I would loose the advantage of the low memory footprint...

      Delete
    3. That's right, it renders the whole of the input image to memory so it can do random access to the pixels.

      It has to do this because it supports second order transforms, things of the form:

      x' = a * x^2 + b * y^2 + c * xy + d * x + e * y + f

      And given a rectangle of pixels to generate, there isn't an easy way to get the bounding box of the pixels you might need. So it just gets all of them at the start.

      Keystoning is a first-order transform though, you just need to transform the corners of the output rect backwards to get the bounding box of the input rect. So you don't need to get everything, you can just get the pixels you need.

      vips_mapim() is the same: because the transform has already been sampled discretely, before making an output rect it can just scan for the max and min values and get the correct bounding box. Probably the best solution would be to rework vips_quadratic() into vips_perspective() (something like that?) and just drop support for second order transforms. Leave that to vips_mapim(). Then you could make it lazy and low memory.

      The matrix formulation should be the same as DT for first order. The clever bit is taking the differential of the matrix, so for each output pixel you can reduce the inner loop to:

      for(x=left; x < right; x++) {
      interpolate(x_pixel, y_pixel);
      x_pixel += dx;
      y_pixel += dy;
      }

      nip2 has keystoning done this way, load an image and click Image / Transform / Perspective.

      Delete
    4. "Keystoning is a first-order transform though, you just need to transform the corners of the output rect backwards to get the bounding box of the input rect. So you don't need to get everything, you can just get the pixels you need."

      Thanks, now I understand the difference!

      I will look into the nip2 implementation, if it's after than what I have at the moment then I'll see how to use it. Indeed in my case I'm computing the inverse transform for each output pixel, and your idea with the first-order derivatives looks really smart and efficient. Could you point me to some specific nip2 source file where you prepare the call to vips_quadratic (or whatever vips operation you use for the linear transform)?

      Thanks!

      Delete
    5. "...if it's after than what I have..." -> ...if it's faster than what I have...

      Delete
    6. The transform matrix is calculated here I think:

      https://github.com/jcupitt/nip2/blob/master/share/nip2/start/_joe_utilities.def#L603

      If that helps much. Yes, you can certainly avoid computing the whole transform for each pixel.

      Delete
    7. Thanks! I'm not sure to understand the relation between the input parameters of "perspective_transform" and the corners of the quadrilateral shape... maybe we can continue the discussion privately, not to spam your blog comments too much.

      Delete
    8. Yes, open an issue on the vips tracker, it sounds like there's something useful we could make from these bits of code.

      I had a go at a benchmark vs. gmic:

      http://www.rollthepotato.net/~john/wobble/warp-gmic.sh
      http://www.rollthepotato.net/~john/wobble/warp-vips.py

      I think the programs are equivalent. The gmic one is obviously much smaller, simpler and more flexible. vips is a bit more than twice as quick, on this machine anyway.

      $ time ./warp-gmic.sh /data/john/pics/wtc.jpg x.jpg
      peak mem: 2.7gb
      real 0m6.315s
      user 0m49.552s
      sys 0m0.432s
      $ time ./warp-vips.py /data/john/pics/wtc.jpg x.jpg
      peak mem: 90mb
      real 0m2.719s
      user 0m25.188s
      sys 0m0.364s

      vips seems to be about 2x faster in terms of number of instructions executed (user time), and gets about another 20% from better threading. This machine has 6 cores, and each core is hyperthreaded: gmic gets about a 4.8x speedup vs. OMP_NUM_THREADS=1, vips gets 6.2x vs. VIPS_CONCURRENCY=1.

      Delete
  2. Need help..!

    I have problems when I type import as below:
    gi.require_version('Vips', '8.0')

    Then error message is:
    ValueError: Namespace Vips not available

    thanks before!

    ReplyDelete
    Replies
    1. Hi, sounds like it can't find the VIPS typelib. There are some notes in the vips python guide on this:

      http://www.vips.ecs.soton.ac.uk/supported/current/doc/html/libvips/using-from-python.html

      Check the first section.

      Delete