index.md - www.eamoncaddigan.net - Content and configuration for https://www.eamoncaddigan.net

index.md (3438B)
      1 ---
      2 title: "Weeknote for 2025-W32"
      3 description: "Catching up on the computational notebook discourse"
      4 date: 2025-08-10T14:31:11-07:00
      5 draft: false
      6 categories:
      7 - Weeknotes
      8 - Data Science
      9 ---
     10 
     11 At work I’m helping my team transition a large code base (really, a collection
     12 of code bases) from a “legacy” closed-source data analysis tool to a popular
     13 “big data platform” which presents an interface based on [computational
     14 notebooks](https://en.wikipedia.org/wiki/Notebook_interface). I’ve had
     15 reservations about notebooks for a while, and this work has sharpened those
     16 frustrations. So much so, that I considered submitting a talk to the [DataBS
     17 conference](https://www.counting-stuff.com/databs-conf-planning-updates-and-how-scrappy-conferences-are-made/)
     18 about pitfalls I’ve experienced while working with notebooks.
     19 
     20 Fortunately (for the sake of saving myself embarrassment and the reviewers
     21 time), I had the good sense---for once---to see if anyone else had voiced
     22 similar critiques. I stumbled upon a seven-year old campaign from Joel Grus
     23 against notebooks, culminating in [this talk at
     24 JupyterCon](https://conferences.oreilly.com/jupyter/jup-ny/public/schedule/detail/68282.html).
     25 
     26 I agree with Grus’s critiques (which have since been [corroborated by
     27 research](https://austinhenley.com/blog/notebookpainpoints.html)), but I don’t
     28 love his proposed alternative: essentially, “use a software engineering
     29 workflow like every other programming project”. I think that data analysis is
     30 distinct from software engineering; tools that are designed for the latter are
     31 awkward for the former and we should hold out for something better.
     32 
     33 What do I suggest instead? Well, when my choice of platform and tooling is
     34 unconstrained, I adopt a hybrid approach. I’ve been making increasingly heavy
     35 use of {targets}[^interview], and modularizing pieces of my code into small,
     36 private packages whenever it makes sense. This means that the notebooks I do
     37 write don’t require heavy computation, because results are cached by {targets}
     38 and much of the code is tucked in a package. This way I’m less tempted to
     39 re-run or skip code chunks, or do anything else to clobber the global state. I
     40 also find that I prefer the RMarkdown/Quarto approach to notebooks, in which a
     41 “notebook” is a plain text file that [interleaves code and
     42 prose](http://literateprogramming.com/), to the Jupyter notebooks approach,
     43 which stores the code and its output in a JSON blob that’s only supported by
     44 some editors[^quarto].
     45 
     46 But I don’t believe this is “the best” approach; there’s a lot of room for
     47 better tools for interactive, reproducible data analyses. I have hazy ideas of
     48 what I’d like to see, but I think they would require a language other than R or
     49 Python---I suspect they’d require changes at the level of the interpreter
     50 itself.
     51 
     52 So in the end, I didn’t submit anything to DataBS. I still look forward to
     53 checking it out the conference!
     54 
     55 [^interview]: I enjoyed [this interview between David Keyes, host of R For the
     56     Rest of Us, and Will Landau, creator of
     57 {targets}](https://rfortherestofus.com/2024/04/podcast-episode-14); it’s a nice
     58 introduction to the package.
     59 
     60 [^quarto]:  The realization that Quarto can access Jupyter as a back-end,
     61     allowing me to use Vim to edit Python and Julia “tangled” with Markdown,
     62 finally got me to pivot from RMarkdown. It actually feels like the best of both
     63 worlds.
	www.eamoncaddigan.net Content and configuration for https://www.eamoncaddigan.net
	git clone https://git.eamoncaddigan.net/www.eamoncaddigan.net.git
	Log \| Files \| Refs \| Submodules \| README