Thursday, 14 April 2011

Gegl with libvips as the back end (updated)

(since first making this post I've done a bit more work on gegl-vips, so this is version 2.0)

I've put up a quick hack of gegl-0.1.6 which uses libvips to process images. It might be a way to put numbers on various possible optimisations of gegl.

https://github.com/jcupitt/gegl-vips

It has some severe limitations. First, it will not work efficiently with interactive destructive operations, like "paint a line". This would need area cache invalidation in vips, which is a way off. Secondly, I've only implemented a few operations (load / crop / affine / unsharp / save / process), so all you can do is some very basic batch processing. It should work for dynamic graphs (change the parameters on a node and just downstream nodes will recalculate) but it'd need a "display" node to be able to test that.

Test program
Here's the test program I've been benchmarking with:
#include <stdio.h>
#include <stdlib.h>

#include <gegl.h>
#include <vips/vips.h>

int
main (int argc, char **argv)
{
  GeglNode *gegl, *load, *crop, *scale, *sharp, *save;

  g_thread_init (NULL);
  gegl_init (&argc, &argv);

  if (argc != 3) 
    {           
      fprintf (stderr, "usage: %s file-in file-out\n", argv[0]);
      exit (1);
    }
        
  gegl = gegl_node_new ();
        
  load = gegl_node_new_child (gegl,
                              "operation", "gegl:load",
                              "path", argv[1], 
                              NULL);
  crop = gegl_node_new_child (gegl, 
                              "operation", "gegl:crop",
                              "x", 100.0,
                              "y", 100.0,
                              "width", 4800.0, 
                              "height", 4800.0, 
                              NULL);
  scale = gegl_node_new_child (gegl,
                               "operation", "gegl:scale",
                               "x", 0.9,
                               "y", 0.9,
                               "filter", "linear", 
                               "hard-edges", FALSE, 
                               NULL);
  sharp = gegl_node_new_child (gegl,
                               "operation", "gegl:unsharp-mask",
                               "std-dev", 1.2, // diameter 7 mask in vips
                               //"std-dev", 1.0, // diameter 7 mask in gegl
                               NULL);
  save = gegl_node_new_child (gegl,
                              "operation", "gegl:save",
                              "path", argv[2], 
                              NULL);
  gegl_node_link_many (load, crop, scale, sharp, save, NULL);
 
  gegl_node_process (save);
                
  g_object_unref (gegl);

  gegl_exit ();

  return (0);
}
ie. load, crop 100 px off the edges (you need to give it a 5k x 5k RGB image), bilinear 10% shrink, sharpen, save.

See:
http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use 
for results with other libraries.

Compile with:
gcc -g -Wall gegl.c `pkg-config gegl vips-7.25 --cflags --libs` 
and run with something like:
$ time ./a.out wtc_small.png wtc2.png
Results
If I run the test program linked against gegl-0.1.6 on a 5,000 x 5,000 pixel RGB PNG image on my laptop (a c2d at 2.4GHz), I get 96s real, 44s user. I tried experimenting with various settings for GEGL_SWAP and friends, but I couldn't get it to go faster than that, I probably missed something. Perhaps gegl's disk cache plus my slow laptop harddrive are slowing it down.

Linked against gegl-vips with the operations set to exactly match gegl's processing, the same thing runs in 27s real, 38s user. So it looks like some tuning of the disc cache, or maybe even turning it off for batch processing, where you seldom need pixels more than once, could give gegl a very useful speedup here. libvips has a threading system which is on by default and does double-buffered write-behind, which also help.

I investigated some other optimisations:

  • If you use uncompressed tiff, you can save a further 15s off the runtime. libpng compression is slow, and even with compression off, file write is sluggish.
  • The alpha channel is not needed in this case, dropping it saves about 5s real time.
  • babl converts to linear float and back with exp() and log(). Using lookup tables instead saves 12s.
  • The gegl unsharp operator is implemented as gblur/sub/mul/add. These are all linear operations, so you can fold the maths into a single convolution. Redoing unsharp as a separable convolution saves 1s.
  • Finally, we don't really need 16-bit output here, 8 is fine. This saves only 0.5s for tiff, but 8s for PNG.
Putting all these together, you get the same program running in 2.3s real, 4s user. This is still using linear float light internally. If you switch to a full 8-bit path you get 1s real, 1.5s user. Gegl is committed to float, but it's interesting to put a number on the cost.

TODO
A screen output node would be fun to experiment with.

The tests ought to be repeated on a faster machine, especially one with a faster hard disk.

1 comment:

  1. I've repeated the tests on a fast desktop machine, see the gegl-vips README for notes.

    ReplyDelete