Schedule a backup for all entities in your appengine

2 Dec 2016

Appengine normally have an admin UI for performing a backup here: https://ah-builtin-python-bundle-dot-<app-id>

And it's great. I can perform a backup on all entities with a few clicks. But I want more. I want to backup daily.

The article Scheduled Backups explains how we can achieve this. But there's one weird detail here:

You must specify at least one entity kind. In the Google Cloud Platform Console, the default is that all entity kinds are backed up. With a cron backup, there is no such default: if you don't specify a kind, it doesn't get backed up.

Wait, so we cannot backup all entities with scheduled backups :S

Here's one workaround:

  1. We create a cron to call our own handler.
  2. In our handler, we can get the list of all entity kinds.
  3. Then, we call the backup API to perform a backup.

The limitation here is that the URL cannot be longer than 2,000 characters. So, if you have too many kinds, you're doomed. Sorry.

Here's the handler code:

from google.appengine.ext.db import metadata from google.appengine.api import taskqueue kinds = [kind.kind_name for kind in metadata.Kind.all() if not kind.kind_name.startswith('_')] taskqueue.add( url='/_ah/datastore_admin/backup.create', method='GET', target='ah-builtin-python-bundle', retry_options=taskqueue.TaskRetryOptions( task_retry_limit=3, min_backoff_seconds=60), # This is a 1-minute backoff. params={ 'filesystem': 'gs', 'gs_bucket_name': 'your_cloud_storage_bucket', 'kind': kinds } )

Note that we need to exclude entities whose name starts with '_' because metadata.Kind.all() returns some hidden entities that don't exist in our datastore. I'm too lazy to find out why…