Validate x-goog-hash from Google Cloud Storage

27 Aug 2016

Google Cloud Storage offers hosting static files. Or they call it durable and highly available object storage.

However, when we download a file from Google Cloud Storage programmatically, there can be many things going wrong.

We need a way to validate that the downloaded file is valid. And checksum is a right way to do that!

Google Cloud Storage offers the response header 'x-goog-hash' when downloading a certain file. It looks like below:

x-goog-hash: crc32c=i67GjA==,md5=Ql15uc9uNKRlT8/uAEh95g=

We can simply compute CRC32C on the downloaded file, encode the value properly, and compare the computed value against the one in x-goog-hash. and that is it!

Well, not so fast.

It turns out that computing the value in Python is quite a journey. It took me 2 hours to figure this out.

The doc casually describe the value as 'The Base64 encoded CRC32c'. It's not as simple as that.

Here's the real steps in Python:

  1. Compute CRC32C using the appengine's crc32c. It turns out that this library returns integer value.
  2. Build a 32-bit hex string from the integer with '%08x' % checksum.
  3. Turn the hexstring into a binary data(?) with binascii.unhexlify(hexstring). I don't even know what this value really is. It looks like a valid string.
  4. Base64-encode with base64.b64encode(binary_data).

The appengine's crc32c returns an integer value. That is what makes it difficult.

If you can use crcmod, it should be better because I see a lot of helpers inside (e.g. hexstring). I can't use crcmod because it contains *.c files.

You can also use gsutil to help debugging the file. gsutil hash to see the base64-encoded value. Or gsutil hash -h to see the hexadecimal value.

Update: Please use crcmod. It is 20x faster than the appengine's crc32c because crcmod is written in C.

Update2: Don't use crc32c at all if you don't have to. md5 is not available for composite objects. But I don't have composite objects. Because it's difficult to install a compiled crcmod on windows.