Hi, I'm Tanin (@tanin). I live in Seattle and currently work at GIVE.asia. This is my technical blog which contains tricks and code nuggets I've discovered. My main blog is here. Enjoy!

The tale of Algolia's Scala client and Playframework

Yesterday I’ve spent 5 hours debugging a strange issue when using the Algolia’s Scala client with Playframework during local development. That is, the period where auto-reloading happens when code is changed.

For a long time (~9 months), we’ve experienced a DNS error “flakily”; later, it turned out to be a non-flaky issue. But more on that later.

The error is an injection error. We can’t initialize AlgoliaClient because we can’t initialize DnsNameResolver.

Caused by: java.lang.NullPointerException
  at io.netty.resolver.dns.DnsNameResolver.<init>(DnsNameResolver.java:303)
  at io.netty.resolver.dns.DnsNameResolverBuilder.build(DnsNameResolverBuilder.java:379)
  at algolia.AlgoliaHttpClient.<init>(AlgoliaHttpClient.scala:56)
  at algolia.AlgoliaClient.<init>(AlgoliaClient.scala:64)
...

Please note that, for a newer version of Netty, the error will be around FailedChannel cannot be casted to Channel. But it still occurs at the same place.

Well, ok, it was a network thingy. It could be flaky, so I ignored it for several months.

Yesterday, I’ve found out that this injection error happens almost exactly when Playframework reloaded code around 23 times. It took me a lot of times to test this hypothesis because changing code 23 times was tedious.

I wrote a script that instantiating AlgoliaClient multiple times, and the exception was always raised at the 26rd AlgoliaClient. The exception wasn’t exactly helpful though. I did a lot of random things afterward with no progress for another hour.

What helped me progress was that I tried to instantiate the 27st AlgoliaClient, and there it was. The actual exception showed itself:

Caused by: java.net.SocketException: maximum number of DatagramSockets reached
  at sun.net.ResourceManager.beforeUdpCreate(ResourceManager.java:72)
  at java.net.AbstractPlainDatagramSocketImpl.create(AbstractPlainDatagramSocketImpl.java:69)
  at java.net.TwoStacksPlainDatagramSocketImpl.create(TwoStacksPlainDatagramSocketImpl.java:70)

After googling for ResourceManager, it turned out that the limit to the number of datagram sockets was 25!

I tested this hypothesis by running sbt Dsun.net.maxDatagramSockets=4 'runMain TheScript', and the script failed after the 4st AlgoliaClient.

Now I know that the instantiated AlgoliaClient wasn’t cleaned up properly, so I simply needed to close it.

Unfortunately, algoliaClient.close() doesn’t close its DNS name resolver. Fortunately, the DNS name resolver is public, and I can close it myself with algoliaClient.httpClient.dnsNameResolver.close().

Now I knew I needed to clean up AlgoliaClient before Playframework reloads code. And, fortunately, Playframework offers a stop hook that is invoked before code reloading.

Here are some lessons for me:

Parallelize tests in SBT on Circle CI

In my previous post on parallelising tests in SBT, it doesn’t work well in practice or, at least, on CircleCI. The main disadvantage is that it doesn’t balance the tests by their run time. Balancing tests by their run time would reduce the total time significantly.

CircleCI offers the command-line tool, named circleci, obviously, for splitting tests. One mode of splitting is splitting tests based on how long individual tests take. If scala_test_classnames contains a list of test classes, we can split using circleci tests split --split-by=timings --timings-type=classname scala_test_classnames. The output is a list of test classes that should be run according to the machine numbered CIRCLE_NODE_INDEX out of the CIRCLE_NODE_TOTAL machines. circleci conveniently reads these two env variables automatically.

At a high level, we want sbt to print out all the test classes. We feed those classes to circleci. Then, we feed the output of circleci to sbt testOnly. Finally, we use CircleCI’s store_test_results to store the test results which includes time. circleci uses this info to split tests accordingly in subsequential runs.

Now it’s time to write an SBT task again. This task is straigtforward because, when googling “sbt list all tests”, the answer is one of the first items. Here’s how I do it in my codebase:

val printTests = taskKey[Unit]("Print full class names of tests to the file `test-full-class-names.log`.")

printTests := {
  import java.io._

  println("Print full class names of tests to the file `test-full-class-names.log`.")

  val pw = new PrintWriter(new File("test-full-class-names.log" ))
  (definedTests in Test).value.sortBy(_.name).foreach { t =>
    pw.println(t.name)
  }
  pw.close()
}

Then, in .circleci/config.yml, we can use the below commands:

...
  - run: sbt printTests
  - run: sbt "testOnly  $(circleci tests split --split-by=timings --timings-type=classname test-full-class-names.log | tr '\n' ' ') -- -u ./test-results/junit"
...

Please notice that:

Finally, we need to store_test_results. It looks like this in our .circleci/config.yml:

...
  - store_test_results:
      path: ./test-results
...

Please note that store_test_results requires the xml file to be in a subdirectory of ./test-results (See reference).

And there you go! Now your SBT tests are parallelised on CircleCI with sensible balancing.

Parallelize tests in SBT with frustration

SBT, the official build tool for Scala, is a very complex build tool. It’s one of those things that makes me wonder if I am stupid or the tool’s complexity surpasses average human intelligence.

I’ve done a few things with SBT (e.g. printing the list of all tests), usually using the trial-and-error approach, and this time I want to add test parallelism to my project.

The requirement is straightforward. Heroku CI or CircleCI can run our tests on multiple machines. In each machine, 2 environment variables, say, MACHINE_INDEX and MACHINE_NUM_TOTAL, are set. We can use these 2 environment variables to shard our tests.

Read more