## 64-Bit Color... And 16-Bit Floats

permalink categories:*programming*originally posted:

*2006-01-24 21:37:38*

Have you seen the HDR rendering in *Half-Life 2: Lost Coast*?
That is possible because of the next big thing in graphics: going from 32-bit color to 64-bit color. But this revolution in color isn't as
simple as just making everything twice as big.

32-bit color uses four 8-bit integers to represent *red*, *green*, *blue*, and *alpha*.
(Alpha means whatever you need it to mean; most commonly it's used for opacity.) Thus, each of the four
values can go from 0 to 255, where 0 is "no brightness" and 255 is "full brightness".
The color `(red=0, green=0, blue=0)` is black,
the color `(red=255, green=0, blue=0)` is bright red, and
the color `(red=128, green=128, blue=128)` is 50% gray.

But the new standard for 64-bit color uses 16-bit *floating-point* numbers for red, green, blue, and alpha.
For each component, 0 is "no brightness", and 1 is "full brightness". 50% gray in 64-bit color is
represented as `(red=0.5, green=0.5, blue=0.5)`.

At first glance, this might strike one as wasteful. After all, the range of a 16-bit float is much greater than just 0 through 1;
the maximum number you can represent in a 16-bit float is 65504. But numbers above 1 seem to be just wasted. And 16-bit floats
can represent negative numbers. What good is having numbers brighter than "full brightness", or dimmer than "absolute black"?
Meanwhile, you're only getting eleven or twelve bits of precision out of those sixteen bits*****. If they'd
used 16-bit integers, they could have preserved an extra five bits of precision... which means they could have represented colors
thirty-two times more precisely. Could it be this 16-bit float thing was a dumb move? In point of fact, no—the best
word to describe it is *brilliant*.

Let's say that you're playing *Half-Life 2: Lost Coast*, and you're partaking in one of my favorite hobbies:
staring straight at the sun. In 32-bit color, the light from the sun would be the color `(red=255, green=255, blue=255)`.
Now let's say you put your gray sunglasses on. Let's say they're really *good* gray sunglasses, and they cut all
colors by an exact by 50%. *Half-Life 2* would multiply the red, green, and blue by 50%, and now the color of
the sun is 50% gray.
Well, sir, that's just not realistic. If you put on 50% sunglasses, the sun would still be too dazzling to look at,
not some unobjectionable gray blob hanging in the sky.

In 64-bit color, you might well represent the sun as being the brightest thing possible. With 64-bit color, that'd be
`(red=65504, green=65504, blue=65504)`. Multiply that by 50%, and you still have
`(red=32752, green=32752, blue=32752)`. So the sun is still rendered as being brighter than your
screen can represent... or your poor eyes can stand.

16-bit floats, like all floats based on the IEEE model, are cunningly encoded. You can read up on IEEE floats here, and on 16-bit floats here.

*The actual precision of a floating-point number depends on the magnitude of the number you want to represent. The closer a number is to zero, the more accurately it can be represented. A 16-bit floating-point number can accurately represent about 11 bits of a fraction. Among other things, this means that if you counted up from 0 to 1 by adding 1/2048 each time, you could accurately represent every one of those numbers in a 16-bit floating-point number. (Why 2048? That's 2 to the 11th power.) But you can't go twice as precisely, by steps of 1/4096; though many of those steps could be represented, many of them could not.

#### Calculating all minifloat values

**[Added: 2006/08/02]**

I was curious as to exactly how many values exist between 0 and 1 for a minifloat. So I wrote a Python program to generate all possible minifloat values and analyzed the results. One thing I discovered while researching this: when used as color values for computer graphics, the "denormalized" values are rounded down to zero. (The "denormalized" values are

*very*close to zero, so close that they're not distinguishable from zero. So this is a speed optimization that loses you very little.) So I computed the values both

*with*and

*without*the denormalized values. The output of my Python program is as follows:

Including all denormalized numbers, there are 15361 m10e5 minifloats in the range (0, 1). Rounding all denormalized numbers to zero, there are 14338 m10e5 minifloats in the range (0, 1). zeroToOneWithoutDenormalized contains all values from 0 to 1 stepping by 2048 (precision of 11 bits). zeroToOneWithoutDenormalized lacked 1024 values from 0 to 1 stepping by 4096 (precision of 12 bits).And you can download the Python program here.

So there you have it. 24-bit color (eight bits per color component, as integers) can represent 256 shades of red, green, and blue. 48-bit color (sixteen bits per color component, as minifloats) can represent 14,338 shades of red, green, and blue; it is only eight times more precise, rather than 256 times more precise, but far more flexible.