The tale of Algolia's Scala client and Play Framework
Yesterday I've spent 5 hours debugging a strange issue when using the Algolia's Scala client with Playframework during local development. That is, the period where auto-reloading happens when code is changed.
For a long time (~9 months), we've experienced a DNS error "flakily"; later, it turned out to be a non-flaky issue. But more on that later.
The error is an injection error. We can't initialize AlgoliaClient
because we can't initialize DnsNameResolver
.
Caused by: java.lang.NullPointerException
at io.netty.resolver.dns.DnsNameResolver.<init>(DnsNameResolver.java:303)
at io.netty.resolver.dns.DnsNameResolverBuilder.build(DnsNameResolverBuilder.java:379)
at algolia.AlgoliaHttpClient.<init>(AlgoliaHttpClient.scala:56)
at algolia.AlgoliaClient.<init>(AlgoliaClient.scala:64)
...
Please note that, for a newer version of Netty, the error will be around FailedChannel
cannot be casted to Channel. But it still occurs at the same place.
Well, ok, it was a network thingy. It could be flaky, so I ignored it for several months.
Yesterday, I've found out that this injection error happens almost exactly when Playframework reloaded code around 23 times. It took me a lot of times to test this hypothesis because changing code 23 times was tedious.
I wrote a script that instantiating AlgoliaClient
multiple times, and the exception was always raised at the 26rd AlgoliaClient
. The exception wasn't exactly helpful though. I did a lot of random things afterward with no progress for another hour.
What helped me progress was that I tried to instantiate the 27st AlgoliaClient
, and there it was. The actual exception showed itself:
Caused by: java.net.SocketException: maximum number of DatagramSockets reached
at sun.net.ResourceManager.beforeUdpCreate(ResourceManager.java:72)
at java.net.AbstractPlainDatagramSocketImpl.create(AbstractPlainDatagramSocketImpl.java:69)
at java.net.TwoStacksPlainDatagramSocketImpl.create(TwoStacksPlainDatagramSocketImpl.java:70)
After googling for ResourceManager, it turned out that the limit to the number of datagram sockets was 25!
I tested this hypothesis by running sbt Dsun.net.maxDatagramSockets=4 'runMain TheScript'
, and the script failed after the 4st AlgoliaClient
.
Now I know that the instantiated AlgoliaClient
wasn't cleaned up properly, so I simply needed to close it.
Unfortunately, algoliaClient.close()
doesn't close its DNS name resolver. Fortunately, the DNS name resolver is public, and I can close it myself with algoliaClient.httpClient.dnsNameResolver.close()
.
Now I knew I needed to clean up AlgoliaClient
before Playframework reloads code. And, fortunately, Playframework offers a stop hook that is invoked before code reloading.
Here are some lessons for me:
- I should have wrote the script much earlier. Changing code 25 times was extremely tedious and time-wasting. Reproducing a bug easily would have saved me a lot of time.
- Biased to make class's members public. It'll be helpful for future users. Scala gets it right to default everything to public, while Java defaults everything to somewhat private. Making a member private should be a conscious and thoughtful decision. If DNS name resolver wasn't public, it would be tricky for me to progress.
- I should have not ignored the injection error during local development, no matter how flaky it is.