Sneak preview inside the 5D Mark IV Dual Pixel RAW files

                                                          -- a1ex, september 2016

The 5D Mark IV is now shipping, and sample images are already available on both camera review sites and blogs. In particular, this sample from kamerabild.se was picked by quite a few others who wanted to look at dual pixel raw data:

In [1]:
%%shell
wget -c http://www.kamerabild.se/sites/kamerabild.se/files/_91A0045.CR2
--2016-09-11 02:12:43--  http://www.kamerabild.se/sites/kamerabild.se/files/_91A0045.CR2
Resolving www.kamerabild.se (www.kamerabild.se)... 81.201.217.68
Connecting to www.kamerabild.se (www.kamerabild.se)|81.201.217.68|:80... connected.
HTTP request sent, awaiting response... 200 OK

    The file is already fully retrieved; nothing to do.

Intro

I've first noticed Laurent Clévy (you know him from http://lclevy.free.fr/cr2/) releasing a command-line tool that could read the two image streams from a dual pixel CR2 (craw2tool.exe). Then, shortly after, Hirax attempted to create depth maps from some dual pixel CR2 images, using Laurent's tool to decode the files 1 2 3. These experiments actually motivated me to look inside the dual pixel raw. I've downloaded the above sample, and after a bit of fiddling with octave, I've got this animation, and also noticed the secondary image from the dual pixel raw contains one extra stop of highlights.

After browsing around, it turns out others were even faster. For example, Iliah Borg from RawDigger already wrote about dual pixel raw and found out the extra stop of highlights as well (before me). His findings confirmed my theories, although I did not agree 100% with what he wrote. Planet Mitch also did a nice summary of the initial findings around dual pixel. And - guess what - although most third party raw converters did not open the 5D4 CR2 files, dcraw did!

When reading all this, I thought "Oh boy, I'm way too late to the party", and went back to my usual stuff.

After doing some more experiments with this sample raw file, I realized there's still a lot to be discovered about the dual pixel to cover, which is why I've started writing this.

Enough chit-chat, let's look at the files.

Loading dual pixel images in octave

You will need Octave 4.0 with 16-bit image support. The default installation on most Linux distros only opens 8-bit images, so it won't work. If you are on Ubuntu, you may use ppa:aalpo/octave-q1; otherwise, you may have to compile it from source. I'm not sure what's the situation on other operating systems - you tell me.

In [2]:
%%shell
# -E is undocumented; it outputs the raw image as with -D, also including optical black areas
dcraw -4 -E -s all -v _91A0045.CR2
Loading Canon EOS 5D Mark IV image from _91A0045.CR2 ...
Building histograms...
Writing data to _91A0045_0.pgm ...
Loading Canon EOS 5D Mark IV image from _91A0045.CR2 ...
Building histograms...
Writing data to _91A0045_1.pgm ...

As expected, we've got two PGM images from a dual pixel CR2. Let's open them in octave:

In [3]:
pkg load image
format compact
im0 = double(imread('_91A0045_0.pgm'));
im1 = double(imread('_91A0045_1.pgm'));
warning: your version of GraphicsMagick limits images to 16 bits per pixel
warning: called from
    imformats>default_formats at line 256 column 11
    imformats at line 79 column 3
    imageIO at line 106 column 11
    imread at line 106 column 30

Let's look at them. The images are huge, so I'll show a downsampled version. For simplicity, I'll show just each Bayer channel.

Why sqrt? The image looks too dark without it.

I know, purists will scream. It's just a quick preview, nothing more.

In [4]:
% subtract black level and apply a "gamma" curve
black = 2048;
im0_disp = sqrt(max(im0-black, 0));
im1_disp = sqrt(max(im1-black, 0));

% show each Bayer channel for each sub-image
im0_channels = [ im0_disp(1:32:end,1:32:end) im0_disp(1:32:end,2:32:end);
                 im0_disp(2:32:end,1:32:end) im0_disp(2:32:end,2:32:end); ];
im1_channels = [ im1_disp(1:32:end,1:32:end) im1_disp(1:32:end,2:32:end);
                 im1_disp(2:32:end,1:32:end) im1_disp(2:32:end,2:32:end); ];

% join both images in a single figure
imshow([im0_channels im1_channels], []);

You have already noticed:

  • the RGGB layout of the Bayer grid (same as with other Canon cameras)
  • the first subimage has clipped sky details in the green and blue channels
  • the second subimage looks darker and there are no clipped highlights

Let's try a color preview:

In [5]:
function imshow_bayer(im)
  % half-res debayer: turn each RGGB cell into a RGB pixel
  % not color-correct, but enough for a quick preview
  rgb(:,:,1) =  im(1:2:end,1:2:end);
  rgb(:,:,2) = (im(2:2:end,1:2:end) + im(2:2:end,1:2:end)) / 2;
  rgb(:,:,3) =  im(2:2:end,2:2:end);

  # black level, white balance and gamma correction (hardcoded)
  # octave also expects image data to be from 0 to 1 when using floating point (double)rgb = sqrt(max(rgb - 2048, 0));
  black = 2048;
  white = 16383;
  rgb = max(rgb-black, 0);
  rgb(:,:,1) *= 2.1;
  rgb(:,:,3) *= 1.4;
  rgb = rgb / (white-black);
  rgb = rgb.^(1/2.2);

  # show a downsampled version
  imshow(rgb(1:8:end,1:8:end,:))
end

% join both images in a single figure
imshow_bayer([im0 im1]);

Not that easy in octave, right? Of course, it's not a raw image processor; it's just a math-oriented programming language.

The preview is not color-correct because I did not apply any color matrix and the gamma correction is not exactly the one required by sRGB. These two are left as an exercise to the reader ;)

Checking image levels

Notice I've assumed the black level is 2048. Let's check the optical black area:

In [6]:
imshow(im0(1:150,1:150), [])
In [7]:
median(im0(:, 1:130)(:))
median(im1(:, 1:130)(:))
ans =  2048
ans =  2049

Now let's check clipping points and relative brightness between the two subimages:

In [8]:
p0 = [prctile(im0(:), [50 90 99 99.9 99.99])' max(im0(:))]
p1 = [prctile(im1(:), [50 90 99 99.9 99.99])' max(im1(:))]
p0 =
    2980    7614   16383   16383   16383   16383
p1 =
    2512    4747   10413   12090   13036   16383
In [9]:
median_ratio = (p0(1)-black) / (p1(1)-black)
median_ratio =  2.0086

The second sub-image is exactly half as bright, compared to the main image. Both images clip at 16383.

Therefore, we have exactly 1 stop of additional highlight detail in the dual pixel raw, compared to a regular raw.

Hypothesis: the first sub-image may be the sum of the two images from each half-pixel, and the second image may be one of the half-pixel images:

In [10]:
A = im0 - im1 + black;
B = im1;
M = im0;
In [11]:
imshow_bayer([A B]);

Seems legit.

One little problem: where the main image is overexposed, the missing half-pixel image can't be estimated correctly. Let's clip it, and try not to get mad at Canon for throwing away this important data.

In [12]:
A = im0 - im1 + black;
A(im0 == 16383 | im1 == 16383) = 0;
imshow_bayer([A B]);

Highlight recovery - one extra stop

If my hypothesis is correct, the overexposed data is not because of analog saturation - it happened when the two half-pixel images were added, and the result was clipped to 16383 (because the CR2 file format has 14 bits).

There is a problem: we only have a half-pixel image with the highlight details. This image is shifted from the main image, so we must be very careful when doing the transition. A smooth blending would probably be simple enough and give good results. A thresholding approach will give artifacts.

You may ask: why Canon implemented it this way? Why did they not average the two half-pixel images, rather than just summing them?

My guesses:

  • ISO is defined from clipping point, so they had to clip it before the sensor saturated
  • If that is true, they should have offered a true ISO 50. They didn't. Why? No idea. It would have been as simple as dividing the main image data by 2 before clipping it to 16383.
  • Compatibility: they probably wanted the main image data in the dual raw to be the same as a regular raw image.
  • Shadow detail: dividing by 2 would steal one bit from shadows. I'm not sure that bit actually contains useful data, since the noise floor is about 3 DN. Please prove me wrong.
  • Iliah Borg also says they did this to avoid banding (fixed pattern noise). I disagree - dividing by 2 does not introduce nor amplify banding. Proof on request.
  • He also mentions other reasons here, none of which seem - to me - of higher importance than preserving one full stop of highlight details.

As an engineer, I would have saved the two half-pixel raw images and be done with it. I doubt their sum is done in analog domain, and I don't see any benefit for doing that, so this approach would not lose any useful signal. Iliah, if you are reading this, please prove me wrong.

Depth map - can we get a 3D image?

With Python and OpenCV 3.x, you can get a simple depth map, but the results are far from impressive:

However, I doubt this algorithm is the state of art in stereo matching, and I'm sure the research community is already looking into it. I wouldn't be surprised to see a stereo matching algorithm fine-tuned for 5D Mark IV dual pixel images.

Focus adjustment in postprocessing

This one gets interesting. Canon's DPP can do some slight adjustments of the focus point, which could be great for portraits with slightly missed focus.

  • How does this work?
  • Can we push it even further?

Notice the two sub-images have slightly different misalignment, while they are perfectly aligned at the focus point.

The simplest way to re-adjust focus would be to shift one or both sub-images horizontally, then add (or average) them. How well does this work?

Certainly, there is a lot of room for improvement, but we are probably on the right track.

Simulation - undoing lens blur

Let's think at something else - can we recover any details from severely out-of-focus background?

I don't have any test images about this one, so I'll do a simulation. Ideally, an out-of-focus background can be obtained by filtering the linear image with a disk kernel. Want proof? Just look at the bokeh shape of your favorite lens. On a real-world image, we would have to identify that kernel (also called PSF) - it depends on lens, aperture, focus distance, subject distance, distance from center... others? Check the crappy lens paper for details.

Since the sum of the two half-pixel images is a regular out-of-focus image, we can imagine our kernel being divided in two asymmetrical halves, somewhat like this:

In [13]:
f = fspecial('disk', 5);
k = ones(11,1) * linspace(0, 1, 11);
f1 = f .* (1-k);
f2 = f .* k;
show([f f1 f2], [])

Do the half-pixel highlights look like these? If yes, my model should be realistic. If no... please send me a sample.

Anyway, let's load an image and blur it.

In [14]:
%%shell
wget -c http://www.ece.rice.edu/~wakin/images/lena512.pgm
--2016-09-11 02:13:45--  http://www.ece.rice.edu/~wakin/images/lena512.pgm
Resolving www.ece.rice.edu (www.ece.rice.edu)... 128.42.246.177
Connecting to www.ece.rice.edu (www.ece.rice.edu)|128.42.246.177|:80... connected.
HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable

    The file is already fully retrieved; nothing to do.

In [15]:
a = double(imread('lena512.pgm'));          % load and convert to double
a = a(101:end-100, 101:end-100);            % crop it
imshow(a, [0 255])                          % and display it
In [16]:
af  = imfilter(a, f);
af1 = imfilter(a, f1);
af2 = imfilter(a, f2);
imshow([a af; af1 af2], [0 255])

Now let's try to undo the blur.

To my knowledge, this task is best done using deconvolution. I'll admit I don't understand the math behind it, so I'm going to use a simpler tool - a simple FIR filter whose coefficients were computed with a linear regression. I've described the technique here. If you know the proper name for this, please let me know.

In [17]:
[f,rec0] = undo_filter(a, af, 7, 0);
imshow([a(11:end-10,11:end-10) rec0], [0 255])
resid_stdev =
   16.5350    9.5096

Got it, so it's pretty hard to undo a simple blurred image.

But two images blurred in different ways should contain more information, right?

In [18]:
[f,rec] = undo_filter_2f(a, af1, af2, 7, 0);
imshow([a(11:end-10,11:end-10) rec], [0 255])
resid_stdev =
   30.9166   30.8591    1.7777

Wait, is that for real?!

To emphasize, the image from the right was recovered from the two blurred images from below:

In [19]:
imshow([af1 af2], [0 127])

Difference between ideal and recovered:

In [20]:
imshow([a(11:end-10,11:end-10) - rec], [-20 20])

Let's add a bit of noise. Real cameras have plenty of it ;)

In [21]:
an1 = af1 + randn(size(a));
an2 = af2 + randn(size(a));
imshow([af1 an1], [0 128])
In [22]:
[f,recn] = undo_filter_2f(a, an1, an2, 7, 0);
figure,imshow([a(11:end-10,11:end-10) recn], [0 255])
figure,imshow([a(11:end-10,11:end-10) - recn], [-20 20])
resid_stdev =
   30.9354   30.8746    7.7740

Okay, so the noise limits how much we can recover from the blurred image. Still, the two half-pixel image do contain a lot more information than their average!

Just for kicks, let's try a smaller amount of noise:

In [23]:
an1 = af1 + randn(size(a))/5;
an2 = af2 + randn(size(a))/5;
[f,recn] = undo_filter_2f(a, an1, an2, 7, 0);
figure, imshow([a(11:end-10,11:end-10) recn], [0 255])
figure, imshow([a(11:end-10,11:end-10) - recn], [-20 20])
resid_stdev =
   30.9165   30.8582    4.7729

Much better.

In any case, I think this dual pixel raw is definitely work researching.