This is a quick “howto” post to describe how to parse OMNIC Specta SPA files, in case anyone goes a-google’n for a similar solution in the future.
SPA files consist of some metadata, along with the data as little endian float32. The files contain a basic manifest right near the start, including the offset and runlength for the data. The start offset is at byte 386 (two byte integer), and the run length is at 390 (another two byte int). The actual data is strictly made up of the little endian floats – no start and stop, no control characters.
These files are pretty easy to parse and plot, at least to get a simple display. Here’s some R code to read and plot an SPA:
pathToSource <- "fill_in_your_path";
to.read = file(pathToSource, "rb");
# Read the start offset
seek(to.read, 386, origin="start");
startOffset <- readBin(to.read, "int", n=1, size=2);
# Read the length
seek(to.read, 390, origin="start");
readLength <- readBin(to.read, "int", n=1, size=2);
# seek to the start
seek(to.read, startOffset, origin="start");
# we'll read four byte chunks
floatCount <- readLength/4
# read all our floats
floatData <- c(readBin(to.read,"double",floatCount, size=4))
floatDataFrame <- as.data.frame(floatData)
floatDataFrame$ID<-seq.int(nrow(floatDataFrame))
p.plot <- ggplot(data = floatDataFrame,aes(x=ID, y=floatData))
p.plot + geom_line() + theme_bw()
In my particular case, I need to plot them from PHP, and already have a pipeline that shells out to gnuplot to plot other types of data. So, in case it’s helpful to anyone, here’s the same plotting in PHP.
<?php
function generatePlotForSPA($source, $targetFile) {
$sourceFile = fopen($source, "rb");
fseek($sourceFile, 386);
$targetOffset = current(unpack("v", fread($sourceFile, 2)));
if($targetOffset > filesize($source)) {
return false;
}
fseek($sourceFile, 390);
$dataLength = current(unpack("v", fread($sourceFile, 2)));
if($dataLength + $targetOffset > filesize($source)) {
return false;
}
fseek($sourceFile, $targetOffset);
$rawData = fread($sourceFile, $dataLength);
$rawDataOutputPath = $source . "_raw_data";
$outputFile = fopen($rawDataOutputPath, "w");
fwrite($outputFile, $rawData);
fclose($outputFile);
$gnuScript = "set terminal png size {width},{height};
set output '{output}';
unset key;
unset border;
plot '<cat' binary filetype=bin format='%float32' endian=little array=1:0 with lines lt rgb 'black';";
$targetScript = str_replace("{output}", $targetFile, $gnuScript);
$targetScript = str_replace("{width}", 500, $targetScript);
$targetScript = str_replace("{height}", 400, $targetScript);
$gnuPath = "gnuplot";
$outputScript = "cat \"" . $rawDataOutputPath . "\" | " . $gnuPath . " -e \"" . $targetScript . "\"";
exec($outputScript);
if(!file_exists($targetFile)) {
return false;
}
return true;
}
?>
Any advice on finding the header information (namely the timestamp)?
The metadata start byte is stored at byte 370, and the metadata length is at byte 374. The first bit of the metadata includes that header, so I would probably read the whole chunk (from the offset until the offset+length) and then parse the rest with normal string parsing.
Thanks for your help on this. It’s looking like to plot the spectra I’m going to have to find the start wavenumber, end wavenumber and the spectra resolution/ step-size between wavenumbers. Then I can replace the ID column with the wavenumber. It’s looking like the data is stored backwards (wavelength vs wavenumber)? Way to go Thermo!