commit 2ccafe8966d95becf96342817e69a598fb71a2ca
parent 17aaabd7b8e4236131830d90079965e1433af5e5
Author: Eamon Caddigan <eamon.caddigan@gmail.com>
Date: Sun, 10 Aug 2025 16:06:49 -0700
Add weeknote for 2025-W32
Diffstat:
1 file changed, 63 insertions(+), 0 deletions(-)
diff --git a/content/posts/weeknotes/2025-w32/index.md b/content/posts/weeknotes/2025-w32/index.md
@@ -0,0 +1,63 @@
+---
+title: "Weeknote for 2025-W32"
+description: "Catching up on the computational notebook discourse"
+date: 2025-08-10T14:31:11-07:00
+draft: false
+categories:
+- Weeknotes
+- Data Science
+---
+
+At work I’m helping my team transition a large code base (really, a collection
+of code bases) from a “legacy” closed-source data analysis tool to a popular
+“big data platform” which presents an interface based on [computational
+notebooks](https://en.wikipedia.org/wiki/Notebook_interface). I’ve had
+reservations about notebooks for a while, and this work has sharpened those
+frustrations. So much so, that I considered submitting a talk to the [DataBS
+conference](https://www.counting-stuff.com/databs-conf-planning-updates-and-how-scrappy-conferences-are-made/)
+about pitfalls I’ve experienced while working with notebooks.
+
+Fortunately (for the sake of saving myself embarrassment and the reviewers
+time), I had the good sense---for once---to see if anyone else had voiced
+similar critiques. I stumbled upon a seven-year old campaign from Joel Grus
+against notebooks, culminating in [this talk at
+JupyterCon](https://conferences.oreilly.com/jupyter/jup-ny/public/schedule/detail/68282.html).
+
+I agree with Grus’s critiques (which have since been [corroborated by
+research](https://austinhenley.com/blog/notebookpainpoints.html)), but I don’t
+love his proposed alternative: essentially, “use a software engineering
+workflow like every other programming project”. I think that data analysis is
+distinct from software engineering; tools that are designed for the latter are
+awkward for the former and we should hold out for something better.
+
+What do I suggest instead? Well, when my choice of platform and tooling is
+unconstrained, I adopt a hybrid approach. I’ve been making increasingly heavy
+use of {targets}[^interview], and modularizing pieces of my code into small,
+private packages whenever it makes sense. This means that the notebooks I do
+write don’t require heavy computation, because results are cached by {targets}
+and much of the code is tucked in a package. This way I’m less tempted to
+re-run or skip code chunks, or do anything else to clobber the global state. I
+also find that I prefer the RMarkdown/Quarto approach to notebooks, in which a
+“notebook” is a plain text file that [interleaves code and
+prose](http://literateprogramming.com/), to the Jupyter notebooks approach,
+which stores the code and its output in a JSON blob that’s only supported by
+some editors[^quarto].
+
+But I don’t believe this is “the best” approach; there’s a lot of room for
+better tools for interactive, reproducible data analyses. I have hazy ideas of
+what I’d like to see, but I think they would require a language other than R or
+Python---I suspect they’d require changes at the level of the interpreter
+itself.
+
+So in the end, I didn’t submit anything to DataBS. I still look forward to
+checking it out the conference!
+
+[^interview]: I enjoyed [this interview between David Keyes, host of R For the
+ Rest of Us, and Will Landau, creator of
+{targets}](https://rfortherestofus.com/2024/04/podcast-episode-14); it’s a nice
+introduction to the package.
+
+[^quarto]: The realization that Quarto can access Jupyter as a back-end,
+ allowing me to use Vim to edit Python and Julia “tangled” with Markdown,
+finally got me to pivot from RMarkdown. It actually feels like the best of both
+worlds.