index.md (3438B)
1 --- 2 title: "Weeknote for 2025-W32" 3 description: "Catching up on the computational notebook discourse" 4 date: 2025-08-10T14:31:11-07:00 5 draft: false 6 categories: 7 - Weeknotes 8 - Data Science 9 --- 10 11 At work I’m helping my team transition a large code base (really, a collection 12 of code bases) from a “legacy” closed-source data analysis tool to a popular 13 “big data platform” which presents an interface based on [computational 14 notebooks](https://en.wikipedia.org/wiki/Notebook_interface). I’ve had 15 reservations about notebooks for a while, and this work has sharpened those 16 frustrations. So much so, that I considered submitting a talk to the [DataBS 17 conference](https://www.counting-stuff.com/databs-conf-planning-updates-and-how-scrappy-conferences-are-made/) 18 about pitfalls I’ve experienced while working with notebooks. 19 20 Fortunately (for the sake of saving myself embarrassment and the reviewers 21 time), I had the good sense---for once---to see if anyone else had voiced 22 similar critiques. I stumbled upon a seven-year old campaign from Joel Grus 23 against notebooks, culminating in [this talk at 24 JupyterCon](https://conferences.oreilly.com/jupyter/jup-ny/public/schedule/detail/68282.html). 25 26 I agree with Grus’s critiques (which have since been [corroborated by 27 research](https://austinhenley.com/blog/notebookpainpoints.html)), but I don’t 28 love his proposed alternative: essentially, “use a software engineering 29 workflow like every other programming project”. I think that data analysis is 30 distinct from software engineering; tools that are designed for the latter are 31 awkward for the former and we should hold out for something better. 32 33 What do I suggest instead? Well, when my choice of platform and tooling is 34 unconstrained, I adopt a hybrid approach. I’ve been making increasingly heavy 35 use of {targets}[^interview], and modularizing pieces of my code into small, 36 private packages whenever it makes sense. This means that the notebooks I do 37 write don’t require heavy computation, because results are cached by {targets} 38 and much of the code is tucked in a package. This way I’m less tempted to 39 re-run or skip code chunks, or do anything else to clobber the global state. I 40 also find that I prefer the RMarkdown/Quarto approach to notebooks, in which a 41 “notebook” is a plain text file that [interleaves code and 42 prose](http://literateprogramming.com/), to the Jupyter notebooks approach, 43 which stores the code and its output in a JSON blob that’s only supported by 44 some editors[^quarto]. 45 46 But I don’t believe this is “the best” approach; there’s a lot of room for 47 better tools for interactive, reproducible data analyses. I have hazy ideas of 48 what I’d like to see, but I think they would require a language other than R or 49 Python---I suspect they’d require changes at the level of the interpreter 50 itself. 51 52 So in the end, I didn’t submit anything to DataBS. I still look forward to 53 checking it out the conference! 54 55 [^interview]: I enjoyed [this interview between David Keyes, host of R For the 56 Rest of Us, and Will Landau, creator of 57 {targets}](https://rfortherestofus.com/2024/04/podcast-episode-14); it’s a nice 58 introduction to the package. 59 60 [^quarto]: The realization that Quarto can access Jupyter as a back-end, 61 allowing me to use Vim to edit Python and Julia “tangled” with Markdown, 62 finally got me to pivot from RMarkdown. It actually feels like the best of both 63 worlds.