Loading JPEG-compressed pyramidal TIFF files in the browser

Quick bit of backstory. In the world of digital preservation and archiving, there’s a standard called IIIF (pronounced triple-eye-eff). It’s designed to provide media (initially images, but now other types of media) interoperability between different archival platforms. So, if you’ve got a two high resolution images, stored in two different archives, IIIF would let you load both images side by side in a viewer for comparison. That’s pretty cool! In theory!

The reality of IIIF in practice is a little more boring – it’s become shorthand for “zooming image viewer” and there seem to be relatively few interesting examples of interoperability. Because IIIF is basically only used by nonprofit archives and museums, the tooling isn’t necessarily the latest and greatest. For zooming images (the focus of this article), one option is to precompute all of your image pyramids and write them out as individual tiles on a server, with a JSON manifest. However, storing tens of thousands of tiny JPEGs on a storage platform like S3 also has its downsides in terms of cost, complexity and performance.

Alternatively, you can run a IIIF server which does tiling on-the-fly. On-the-fly tiling can be really CPU and IO intensive, so these servers often rely on healthy caches to deliver good performance. One workaround is to store your source images in a pre-tiled, single file format like a pyramidal TIFF file. It’s basically what it sounds like – a single .tif file, which internally stores your image in a bunch of different zoom levels, broken up into small (usually 256×256 pixel) tiles. These individual tiles can be JPEG compressed to minimize storage size. A tile server can very quickly profile these tiles in response to a request, because no actual image processing is required – just disk IO. In a cloud-first world, tile servers add a lot of cost and complexity, which is one disadvantage of this approach.

Our Elevator app has supported zooming images for years, for both standard high resolution images and specialized use cases like SVS and CZI files from microscopes. Initially, our backend processors did this using the many-files approach described above. We used vips to generate deepzoom pyramids, which are essentially a set of folders each with JPEG tiles (sometimes tens of thousands). These were copied to S3, and could be loaded directly from S3 by our custom Leaflet plugin. Because the images follow a predictable naming convention (zoom_level/x_y.jpeg), there’s no translation needed to find the right tiles for a given image coordinate. This had some definitely downsides though – writing all those files was disk intensive, copying them to S3 was slow and sometimes flakey (hitting rate limits for example) and deleting a pyramid was an o(n) operation – one call per image.

As a quick fix to deal with some specific rate limit issues, we first moved to a tar+index approach using an in-house tool called Tarrific (thanks James!). After using vips to make the deepzoom pyramid, we tarred the files (without gzip compression) and copied the single tar file to the server. Tarrific then produced a JSON index of the tar, with bye offsets for each image. That was stored on the server as well. We were able to update our leaflet plugin to read the index file (itself gzip encoded), then do range requests into the tar to access the files we needed, which were still just JPEGs. This solved the S3 side of the equation, giving us a single file to upload or delete. However, I didn’t love having a totally proprietary format, and it still involved a bunch of extra disk operations during encoding.

That brings us up to the present-ish day. Now, I should mention, Elevator doesn’t support IIIF. That’s mostly because I run it as a dictatorship and I don’t find IIIF that useful. But it’s also because the IIIF format imposes some costs that wouldn’t fit in well in our AWS attribution model. That said, I’ve always tried to keep IIIF in mind as something we could support in the future, if there was a good use case. To that end, I recently spent some time looking at different file formats which would be well suited for serving via a IIIF server. In pursuit of that I did a deep dive on pyramidal tiffs. As I dug in, it seemed like in principle they should be able to be served directly to the browser, using range requests, much like we’d done with tar files. In fact, there are related formats like GeoTIFF that seem to do exactly that. Those other formats (or specifically, their tooling) didn’t seem well suited to the massive scale of images we deal with though (think tens of hundreds of gigapixels).

I recently had a long train ride across Sri Lanka (#humblebrag) and decided to see if I could make a pyramidal tiff file work directly in Leaflet without any server side processing. Turns out the answer is yes, hence this post!

Beginning at the beginning, we’re still using VIPS to do our image processing, using the tiffsave command.

vips tiffsave sourceFile --tile --pyramid --compression jpeg -Q 90 --tile-width 256 --tile-height 256 --bigtiff --depth onepixel outputFile.tiff

(The onepixel flag is there to maintain compatibly with our existing leaflet plugin, which assumes that pyramids start from a 1×1 scale instead of a single tile scale.)

The trick with pyramidal tiff files is that even though they support using jpeg compression on the image tiles, they don’t actually store full jpegs internally. Each zoom level shares a single set of quantization tables, and the individual tiles don’t have a full set of JPEG headers. Fortunately, JPEG is a super flexible format when it comes to manipulation, as it’s a linear sequence of sections, each with a start and end marker. Nothing is based on specific byte offsets, which would have made this a nightmare.

Rather than writing a full tiff parser from scratch, we’re able to leverage geotiff.js to do a lot of the heavy lifting. It handles reading the tiff header and determining the number of zoom levels (image file directories or IFDs in tiff parlance). From there, we can determine the offsets for specific tiles and get the raw data for each tile. The basic code for doing that is below (without the leaflet specific bits), though you can check out (very poorly organized) git repo for the full leaflet plugin and parser.

// first, a method to fetch the TIFF image headers
var image;
const loadIndex = async function() {
    tiff = await GeoTIFF.fromUrl("PathToYourTiffFile);
    image = await tiff.getImage();
}

var subimages = {};

// get the subimage for a given zoom level from the overall image.
// abuse globals because we've already broken the seal on that.
const getSubimage = async function getSubimage(coords) {
    // the headers for each zoom level need to be fetched once.
    if(subimages[coords.z] == undefined) {
        subimages[coords.z] = await tiff.getImage(maxZoom - coords.z);
    }
    const subimage = subimages[coords.z];
    return subimage;
};


// fetch the raw JPEG tile data. Note that this isn't a valid jpeg.
// coords is an object with z, x, and y properties. 
const fetchRawJPEGTile = async function(coords) {

    const tileSize = 256;
    const subimage = await getSubimage(coords);
    const numTilesPerRow = Math.ceil(subimage.getWidth() / subimage.getTileWidth());
    const numTilesPerCol = Math.ceil(subimage.getHeight() / subimage.getTileHeight());
    const index = (coords.y * numTilesPerRow) + coords.x;
    // do this with our own fetch instead of geotiff so that we can get parallel requests
    // we need to trick the browser into thinking we're making the request
    // against different files or it won't parallelize the requests
    const offset = subimage.fileDirectory.TileOffsets[index];
    const byteCount = subimage.fileDirectory.TileByteCounts[index];
    const response = await fetch(PathToYourTiffFile + "?random=" + Math.random(), {
        headers: {
            Range: `bytes=${offset}-${offset+byteCount}`,
        },
    });
    const buffer = await response.arrayBuffer();
    return buffer
}

Using the above code, we can use a method like this to fetch the contents of a tile.

await loadIndex();
let myTile = await fetchRawJPEGTile({x: 10, y: 10, z:10});

There’s one big catch here though – as I mentioned, the contents of a pyramidal tiff file are jpeg compressed, but they’re not actual JPEGs.

Geotiff.js has the ability to return tiles as raster (RGB) images, but that wasn’t a great option for us for a couple reasons. First, our existing Leaflet plugin counts on being able to work with <img> tags, and getting the raster data back into an <img> tag (presumably roundtripping through a <canvas>?) would have been clunky. Second, Geotiff.js uses its own internal JPEG decoder, which seems like a waste given every modern chip can do that decoding in hardware. Instead, I went down the path of trying to turn the jpeg-compressed tiles into actual jpeg files.

Through a mix of trial and error with a hex editor, reading various tiff and jpeg specifications, and the very readable libjpeg and jpegtrans source, I arrived at the code below, which gloms together just enough data to make a valid jpeg file.

const parseTileToJPEG = async function parseTileToJPEGBlob(data, coords) {
    const subimage = await getSubimage(coords);
    const uintRaw = new Uint8Array(data);
     // magic adobe header which forces the jpeg to be interpreted as RGB instead of YCbCr
    const rawAdobeHeader = hexStringToUint8Array("FFD8FFEE000E41646F626500640000000000");
    // allocate the output array
    var mergedArray = new Uint8Array(rawAdobeHeader.length + uintRaw.length + subimage.fileDirectory.JPEGTables.length -2 - 2 - 2);
    mergedArray.set(rawAdobeHeader);
    // first two bytes of the quant tables have start of image token which we have to strip, and last two bytes are image end which we gotta strip too
    mergedArray.set(subimage.fileDirectory.JPEGTables.slice(2, -2), rawAdobeHeader.length);
    mergedArray.set(uintRaw.slice(2), rawAdobeHeader.length + subimage.fileDirectory.JPEGTables.length-2 - 2);
    const url =URL.createObjectURL(new Blob([mergedArray], {type:"image/jpeg"})); 
    return url;
};

So, with all of those bits put together, we can arrive at the following sample invocation, which will give us back a blob URL which can be set as the SRC for an <img>. Obviously this could be refactored in a variety of ways – I’m adapting our Leaflet code to make it more readable, but running within Leaflet means we do some abusive things vis-a-vis async/await.

let coords = {x: 10, y: 10, z:10};
let myTile = await fetchRawJPEGTile(coords);
let tileImageSource = await parseTileToJPEG(myTile, coords);

Let’s review the advantages of this approach. First, we’re able to generate a single .tiff file in our image processing pipeline. On our EC2 instances with relatively slow disk IO, this is meaningfully faster than generating a deepzoom pyramid. Synchronizing it to S3 is orders of magnitude faster as well. Working with the file on S3 is now a single file operation instead of o(n). And finally, the files we’re writing will be directly compatible with a IIIF image server if we decide we need to support that in the future.

In an ideal world, I’d love to see the IIIF spec roll in support for this approach to image handling, and just drop the requirement for a server. However, IIIF servers do lots of other stuff as well – dynamically rescaling images, rotating them, etc – so that’s probably not going to happen.

Leave a Reply

Your email address will not be published. Required fields are marked *