I’ve launched Superintendent.app 6 months ago. Initially, I built it to make me more productive at work since I work with
a lot of CSV files, and I loathe using Excel. After building it, I’ve realized that having a Desktop app that can handle GBs of
CSV files is really versatile. I can analyze CSV files from customers, cross-check (i.e. join) them with internal data, reformat them in
some ways, and etc.
Analyzing log files is another interesting use case. I’m sure there are tons of tools that can do the same thing, though I’m not sure
if those tools are backed by a real database. Superintendent.app is backed by Sqlite, a real database, which can result in a different
set of strengths and weaknesses e.g. medium initial load time (10s for 1GB file), quick query time, and powerful functionality.
To recap, Superintedent.app is an offline Desktop app that allows you to import CSV/TSV files (and several other tabular formats) and
write SQLs on those files. Here’s how the app looks like:
As a desktop app, Superintendent.app can handle GBs of files easily. Moreover, the privacy and security aspect of an offline
Desktop app yields better peace of mind.
Superintendent.app also enables you to cross-check (i.e. join) records between the log file and a database table (exported in CSV).
Since Superintendent.app is backed by Sqlite, which is powerful but still quite limited in SQL capabilities compared to other SQL-style databases),
we’ve added 3 convenient SQL functions to make our lives easier, namely, date_parse,
regex_extract, and regex_replace.
WITH sanitized AS (
SELECT
REGEX_EXTRACT(
'^(([^\s]+ ){0,8})', -- Get the first 8 words
REGEX_REPLACE('[0-9]+', REGEX_REPLACE('\s+', message, ' ', false), '0', false) -- Squeeze whitespaces and digits
) AS message,
REGEX_REPLACE('[0-9]+', REGEX_REPLACE('\s+', program, ' ', false), '0', false) AS program,
REGEX_REPLACE('[0-9]+', REGEX_REPLACE('\s+', source_name, ' ', false), '0', false) AS source_name
FROM "2021_11_24_15"
)
SELECT COUNT(*) AS occurence, message FROM sanitized GROUP BY message ORDER BY COUNT(*) DESC
The above SQL yields the result below:
I wish Sqlite supports define a user-defined function within a SQL statement,
so the code will be cleaner. But this is an improvement for another day.
Searching
Searching is the simplest one because you can use column LIKE '%something%' or regex_extract('regex', column) IS NOT NULL in order to search for a specific string.
For example, if you want to search for the log lines where write-iops is more than or equal to 0.087, you can use the SQL below:
SELECT
CAST(regex_extract('write-iops=([0-9\.]+)', message) AS decmial) AS write_iops,
regex_extract('(.{0,30}write-iops=[0-9\.]+.{0,30})', message) AS context, -- Show context
*
FROM "2021_11_24_15"
WHERE write_iops >= 0.087
The SQL yields the result as shown below:
Beautiful, isn’t it?
Wrapping up
If you are familiar with SQL, using SQL to analyze log files makes a lot of sense. You can visit https://superintendent.app to try it out.
Thank you for reading until the end. I’ll see you next time.
Since starting at a new company, I’ve been working on an accounting report that comes in the form of CSVs, so I’ve worked with a lot of CSVs.
As a person who knows SQL well but isn’t good at Excel, working with CSV files in Excel is such a nightmare.
Every time I was doing something, I felt that I would have spent 10x less time if I could use SQL.
VLOOKUP and pivot tables are absolutely difficult to use. It’s not just writing stuff. It’s a combination of coding and meddling with UI.
SQL is much more ergonomic. It’s just a piece of text. There are few benefits of being “a piece of text”:
You can save the SQL in a file and reuse it on different CSV files that have the same format.
It’s easier to check for correctness because the whole logic is contained within a single SQL.
Besides the above, there’s a blocker where Excel cannot load a CSV with more than 1M rows. Using SQL with an actual database won’t encounter this blocker.
In this blog, I’d like to show how VLOOKUP and pivot tables can be replaced with SQL, and hopefully this would inspire you to learn the basic of SQLs.
VLOOKUP
VLOOKUP is somewhat equivalent to JOIN in SQL.
Let’s say there are 2 CSV files that are associated to each other through employee_id.
employee_id
name
1
John
2
Jane
3
Mary
and
employee_id
salary
1
70000
2
80000
3
60000
We can write a SQL that will construct a table that contains name and salary on the same row:
SELECT
n.employee_id,
n.name,
s.salary
FROM names n
JOIN salaries s
ON n.employee_id = s.employee_id
The above SQL yields:
employee_id
name
salary
1
John
70000
2
Jane
80000
3
Mary
60000
You can see that it’s just a short SQL to achieve what VLOOKUP can do.
Pivot tables
Pivot tables is somewhat equivalent to GROUP BY.
Let’s say there’s one CSV that contains sales per location, and you want to get total sales per location. Here’s the CSV:
state
price
CA
10
CA
12
WA
17
WA
1
NY
9
We can write a SQL that will construct a table that group by state and compute the total sales of each state:
SELECT
state,
SUM(price) as price
FROM sales
GROUP BY state
The above SQL yields:
state
price
CA
22
WA
18
NY
9
If you are familiar with SQL, this would take less than a minute to write.
What next?
Since you read up until this point, I know the question that you are having now is: “ok, I’m convinced. But where can I write SQL on CSVs?”.
I had asked the very same question, and there are many options like:
However, I ended up building my own Desktop application for this purpose: Superintendent.app, which supports Windows, Mac, and Linux.
I intend for Superintendent.app to be an SQL-focused spreadsheet Desktop app that offers great UX like Excel does (e.g. sort columns, sum values on the selected cells).
It’s not there today, but we are moving toward that directly.
Since Superintendent.app is backed by SQLite, it is very fast (e.g. loading 1GB file within 10s on Macbook Pro 2020) and can handle a dataset as large as your memory allows.
While I want you to try Superintendent.app, the main gist is to show you how SQL can significantly improve your workflow, and hopefully you are inspired to start learning SQL.
Thank you for reading until the end, and I’ll see you next time :)
With this mechanism, Tip can be used with any Mac OS app!
Tip also only see the text when user explicitly triggers this feature by hitting the configured shortcut.
Tip gets the selected text, calls the user script, and renders the result in a tooltip as shown below:
Then, you can select an item to perform the defined action. Currently, Tip supports 2 actions: (1) copying to clipboard and (2) opening a URL, which can be used to trigger a Mac app that supports URL like opening a file in IntelliJ.
With a user script that you write yourself, this means you can customize Tip to do anything you want.
My favourite technical detail on Tip is that it runs in App Sandbox
without requesting for additional permissions. Tip can only execute (and read-only) a user script placed in ~/Library/Application\ Scripts/tanin.tip/.
Privacy-wise, this is a huge win.
I’ve been using Tip for 3-4 months now with many use cases. Surprisingly, the use case I use most frequently is:
selecting a file in an error stacktrace and opening that file in IntelliJ/Github. The user script uses the text to search through local file and offers a tooltip item to open matching files on IntelliJ/Github. It looks like below:
I started taking notes in workplace a few years back. What nudged me to start was that I occasionally forgot about my action items after a meeting; I felt forgetful. Since then, I’ve gone through different processes of note taking, and today I want to share what my current process looks like.
I carry a small physical notebook with me everywhere. There are 3 categories that I always note: (1) to-do items, (2) questions, and (3) notable info. Then, later on, I transfer these notes into a digital form.
Noting to-do items is the most important action. When someone asks me to do something, I’ll never want to forget about it. Noting the deadline is as important. I write it down.
Noting questions makes me more informed. If a question pops up in my mind, I note it immediately. No matter how dumb the question seems. Don’t even think about answering it. I write it down.
Though notable info sounds vague, I have a low threshold for it. If something surprises me or I didn’t know about it before, I write it down.
Here you might notice the theme, “just write it down”. I write it down without thinking. When I write down a to-do item, I don’t think about what it entails. When I write down a question, I don’t think about the answer. When I write down notable info, I don’t label it as good or bad. I want to avoid developing impression prematurely. Writing it down without judgement frees my mind from the worry (that I might forget) and allows me to focus on the conversation at hand.
One skill developed with the above process is “Suspended judgement”, the ability to acknowledge without judging. It’s something I’m looking forward to getting better at over time.
When I have free time, I would go through my notes. One by one, I would think about it deeply, transfer the item to a digital form, and cross it out. This is a good opportunity to spend time researching, thinking, and elaborating on each noted item.
The transferred items are crossed with the red ink.
Transferring notes from its physical form to the digital form is an extremely beneficial action. It forces me to review the notes; this yields one more occurrence of Spaced Repetition, which helps me retain important info. The solitude while transferring notes also allows me time to think more deeply about what I noted.
My note taking setup is rather simple. I use a mini Rocketbook with a 4-color Frixion pen. With this combination, I can reuse the notebook forever. I note in these 3 style of bullet points: · (dot) for tasks, ? (question mark) for question, and - (dash) for notable info. On the digital side, I store notes in plaintext and sync to Github using my homegrown app, Git Notes.
Taking notes is a very personal process. What works for me might not works for you. You’ll need to iterate at your own pace to figure out what you like and don’t like. My general advice is to approach it casually. Don’t stretch yourself too much. This is similar to practicing meditation, yoga, or anything alike. We want to deepen our practices, but we only move forward when we are comfortable.
Once we have this noting-thinking loop built into our daily routine, enriching our note taking practice becomes much easier. We can become more considerate by noting other people’s states of minds and later thinking about how that impacts what you do. We can become more aware of our mistakes by noting and later thinking about how to avoid them next time. The possibilities are endless.
It has been a few years already since I started taking note. I feel more grounded. I feel more thoughtful. I feel more confident retaining the info flowing through me. So, I encourage you to start taking notes in workplace, and I hope my process serves as one example that you can take and personalize to suit your style.
I’ve been thinking about my English pronunciation issue since I’ve moved to Bay Area (from Thailand) in 2012. I can’t seem to speak English properly.
There were many moments that made me feel bad about myself.
I can’t pronounce the word guitar. I don’t get it, and people don’t understand me when I pronounce this word. I once said guitar to a lady at the post office. She didn’t get it. I pronounced to her like several times in different stress patterns. She was very patient… but she still didn’t get. Eventually, I spelled it out for her. She got it and pronounced guitar back to me. I felt like that was exactly how I pronounced it.