www.eamoncaddigan.net

Content and configuration for https://www.eamoncaddigan.net
git clone https://git.eamoncaddigan.net/www.eamoncaddigan.net.git
Log | Files | Refs | Submodules | README

commit 2ccafe8966d95becf96342817e69a598fb71a2ca
parent 17aaabd7b8e4236131830d90079965e1433af5e5
Author: Eamon Caddigan <eamon.caddigan@gmail.com>
Date:   Sun, 10 Aug 2025 16:06:49 -0700

Add weeknote for 2025-W32

Diffstat:
Acontent/posts/weeknotes/2025-w32/index.md | 63+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 63 insertions(+), 0 deletions(-)

diff --git a/content/posts/weeknotes/2025-w32/index.md b/content/posts/weeknotes/2025-w32/index.md @@ -0,0 +1,63 @@ +--- +title: "Weeknote for 2025-W32" +description: "Catching up on the computational notebook discourse" +date: 2025-08-10T14:31:11-07:00 +draft: false +categories: +- Weeknotes +- Data Science +--- + +At work I’m helping my team transition a large code base (really, a collection +of code bases) from a “legacy” closed-source data analysis tool to a popular +“big data platform” which presents an interface based on [computational +notebooks](https://en.wikipedia.org/wiki/Notebook_interface). I’ve had +reservations about notebooks for a while, and this work has sharpened those +frustrations. So much so, that I considered submitting a talk to the [DataBS +conference](https://www.counting-stuff.com/databs-conf-planning-updates-and-how-scrappy-conferences-are-made/) +about pitfalls I’ve experienced while working with notebooks. + +Fortunately (for the sake of saving myself embarrassment and the reviewers +time), I had the good sense---for once---to see if anyone else had voiced +similar critiques. I stumbled upon a seven-year old campaign from Joel Grus +against notebooks, culminating in [this talk at +JupyterCon](https://conferences.oreilly.com/jupyter/jup-ny/public/schedule/detail/68282.html). + +I agree with Grus’s critiques (which have since been [corroborated by +research](https://austinhenley.com/blog/notebookpainpoints.html)), but I don’t +love his proposed alternative: essentially, “use a software engineering +workflow like every other programming project”. I think that data analysis is +distinct from software engineering; tools that are designed for the latter are +awkward for the former and we should hold out for something better. + +What do I suggest instead? Well, when my choice of platform and tooling is +unconstrained, I adopt a hybrid approach. I’ve been making increasingly heavy +use of {targets}[^interview], and modularizing pieces of my code into small, +private packages whenever it makes sense. This means that the notebooks I do +write don’t require heavy computation, because results are cached by {targets} +and much of the code is tucked in a package. This way I’m less tempted to +re-run or skip code chunks, or do anything else to clobber the global state. I +also find that I prefer the RMarkdown/Quarto approach to notebooks, in which a +“notebook” is a plain text file that [interleaves code and +prose](http://literateprogramming.com/), to the Jupyter notebooks approach, +which stores the code and its output in a JSON blob that’s only supported by +some editors[^quarto]. + +But I don’t believe this is “the best” approach; there’s a lot of room for +better tools for interactive, reproducible data analyses. I have hazy ideas of +what I’d like to see, but I think they would require a language other than R or +Python---I suspect they’d require changes at the level of the interpreter +itself. + +So in the end, I didn’t submit anything to DataBS. I still look forward to +checking it out the conference! + +[^interview]: I enjoyed [this interview between David Keyes, host of R For the + Rest of Us, and Will Landau, creator of +{targets}](https://rfortherestofus.com/2024/04/podcast-episode-14); it’s a nice +introduction to the package. + +[^quarto]: The realization that Quarto can access Jupyter as a back-end, + allowing me to use Vim to edit Python and Julia “tangled” with Markdown, +finally got me to pivot from RMarkdown. It actually feels like the best of both +worlds.