By tanin in tech — Jul 2, 2023

An open-sourced Ruby DSL for composing maintainable analytical SQL statements

Repository: https://github.com/tanin47/lilit-sql

At the company I work at, we are building an analytics product on top of Presto, which supports SQL as its interface. We have a lot of long SQL statements to power charts, tables, and CSVs. When I say long, a final SQL statement is likely >2,000 lines.

One major pain point arises. There are common parts that can be re-used by multiple SQL statements. It would be annoying to keep duplicate these common parts, so we currently solve this pain point by refactoring certain parts into Ruby's ERBs. However, this is extremely brittle because SQL is a declarative language. It requires the common part to be aware of the columns of the previous statements.

Consider the below example where there are 3 reports:

MRR report that has the following columns: month, mrr_amount, customer_id, customer_name
Payment report that has the following columns: charge_id, timestamp, amount, customer_id, customer_name, invoice_id, invoice_number
Unpaid invoice report that has the following columns: invoice_id, unpaid_amount, age, invoice_number

The common part that we can see from this report is the customer_name and invoice_number column where we can get them by joining with the customers and invoices table. You can imagine it could be implemented as below if SQL was more advanced:

select
  main.*,
  -- if main contains customer_id
    customers.name as customer_name
  -- end
  -- if main contains invoice_id
    invoices.number as invoice_number
  -- end
from
  main
  -- if main contains customer_id
    left join customers
    on main.customer_id = customers.id
  -- end
  -- if main contains invoice_id
    left join invoices
    on main.invoice_id = invoices.id
  -- end

The above pattern would have been fine if SQL supported the capability to check whether a certain column exists in the previous SQL statement before writing the next SQL.

But SQL doesn't support that!

This is why lilit-sql, the Ruby DSL for composing maintainable SQL statements, is built, and we can achieve the same thing with:

Customer = Struct.new(:id, :name)
Invoice = Struct.new(:id, :number)

def with_lookup_columns(query)
  if query.has?(:customer_id)
    customers = Query.from(Table.new(Customer, 'customers'))
    query = query.left_join(customers) do |*tables|
      left = tables.first
      invoice = tables.last

      left.customer_id == customer.id
    end
  end

  if query.has?(:invoice_id)
    invoices = Query.from(Table.new(Invoice, 'invoices'))
    query = query.left_join(invoices) do |*tables|
      left = tables.first
      invoice = tables.last

      left.invoice_id == invoice.id
    end
  end
end

You can see the full example in README.md

lilit-sql goes even further and supports "dynamic" columns as well where the final set of columns depends on user's input.

This is a requirement in one of our reports called "waterfall". For example, if a user selects the date range of June to Oct, then the columns are: person, june_amount, july_amount, aug_amount, sep_amount, and oct_amount. There's an example of the dynamic column SQL statement in the README.md.

Since lilit-sql provides these higher-abstraction-esque capabilities (e.g. reading which column exists and defining columns dynamically), we can re-use business logic and make our engineering team more productive and efficient in maintaining a lot of SQL statements.

lilit-sql is still in its nascent. If the pain points described here resonate with you, I'd love to work with you to iterate the library to be something that fits your needs. Please don't hesitate to reach out by creating an issue on the repository!

Subscribe to tanin