index.md (14527B)
1 --- 2 title: "Programmers should reject LLM-based coding assistants" 3 description: "Even without the ethical issues present in the coding assistants 4 that exist, these tools are fundamentally unfit for the job." 5 date: 2024-02-22T20:36:52-08:00 6 draft: false 7 categories: 8 - Programming 9 - Data Science 10 tags: 11 - R 12 --- 13 14 The complexity of our world is beyond the limits of human comprehension. In 15 spite of this, we generally feel like we understand what’s going on around 16 us. Each of us achieves this feat of self-deception by cobbling together an 17 assortment of abstractions, mental models, schemas, and metaphors[^science]. 18 When life confronts us with yet another task that demands attention, we 19 select the most appropriate of these and ignore the universe of details that 20 are (hopefully) irrelevant. This approach generally works surprisingly well! 21 22 Computers are less complex than the whole world, but they still resist human 23 comprehension. Computer practitioners once again rely on abstractions, etc., 24 in order to muddle their way through things—it’s the only hope we’ve 25 got[^comprehend]. Programming languages are among the best tools in our 26 arsenal, allowing us to transform human-written source code (which is a sort 27 of mashup of human language and mathematical notation—another good tool for 28 approximating the world) into the list of numbers comprising a CPU’s 29 instructions. Sometimes our programs even work. 30 31 Some people truly love programming for its own sake, but there aren’t enough 32 of them to fill all the jobs that require doing so. Further complicating 33 matters, even these folks only _really_ like writing certain kinds of code, 34 which generally represents a minority of the code employers need. When taken 35 together, these observations imply that most code is written begrudgingly—it 36 is not exactly [contributing to self-discovery or spiritual 37 growth](https://codeberg.org/oneirophage/practice-guide-for-computer/src/branch/main/guide.pdf). 38 39 This is probably one reason that large language model-based coding 40 assistants (LLMCAs) are becoming popular with some programmers. The most 41 well-known of these tools is GitHub Copilot, developed by Microsoft and 42 OpenAI[^ai]. LLMs work by learning representations of language (including, 43 in the case of LLM-based coding assistants, programming languages) that 44 result in good performance at predicting the next token in a sequence. For 45 a programmer using a LLMCA to help with their work, they experience 46 “auto-complete for code”. In short, they speed up the process of writing 47 programs, and “writing programs” is the thing that programmers are paid to 48 do. 49 50 There are ethical issues with the use of the LLMCAs that currently exist. 51 Copilot specifically was trained on code that was posted to GitHub, and the 52 authors of this code were not asked for their informed consent to have their 53 work being used this way[^ethics]. LLM-based models are also particularly 54 energy intensive, which is something that should concern anybody who cares 55 about climate change[^climate]. LLMCAs are also probably illegal[^illegal] 56 as Copilot is known to have violated the licenses of most open source code 57 posted there. Especially damning is the use of [“copyleft” code]({{< ref 58 "use-the-gpl" >}}) in its training corpus. Such code was licensed in 59 a manner that allows for its adaptation and reuse (which is what Copilot is 60 ultimately doing—adapting and reusing code at scale), but _only_ when the 61 resulting code is also shared with the same license. Whether or not you’d 62 like to see the proliferation of Copilot-generated code result in _all_ 63 programs becoming copyleft, I don’t think that’s what its users (or their 64 employers) intend to have happen. 65 66 But the above issues with LLMCAs are at least solvable in theory. Viz: 67 a company as well-resourced as Microsoft _could_ train its model using code 68 that was collected with the authors’ explicit consent, and advances in 69 energy infrastructure and algorithmic efficiency _might_ bring the climate 70 impact of coding assistants down into acceptable levels. However, there is 71 a existential issue with LLMCAs that should inspire programmers to reject 72 them out of hand: even though they address a real problem, they are the 73 wrong tool for the job. 74 75 The real problem that LLMCAs attempt to address is that many programmers are 76 ill-served by the rest of their tooling. I don’t have the personal 77 experience with web programming to opine on the state of the JavaScript 78 ecosystem, but there is an emerging recognition that the current status quo 79 (which starts by reaching for the JavaScript framework du jour, and solves 80 the problems that arise from using it by bolting on addition dependencies) 81 is unpleasant and untenable. This approach to developing applications may 82 generate a lot of code, but it isn’t really _programming_[^kids]; while 83 bolting together disparate parts is sometimes an appropriate way to build 84 something, it can’t be the only way we build things. As the early 20th 85 century biologist Edward J. v. K. Menge noted[^menge]: 86 87 > Breeding homing pigeons that could cover a given space with ever 88 > increasing rapidity did not give us the laws of telegraphy, nor did 89 > breeding faster horses bring us the steam locomotive. 90 91 Sometimes people get the opportunity to apply cleverness and creativity to 92 find new solutions to problems. This usually starts by taking a step back 93 and understanding the problem space in a holistic manner, and then finding 94 a different way to think about it. Coders working with LLMCAs[^unlucky] 95 won’t be able to do this very often. 96 97 So what does a good solution to this tooling problem look like? Here I’ll 98 share an example from the R world, since it’s the primary language I’ve 99 programmed in for the past ten years. I’ve been doing statistics and “data 100 science” for longer than that, and programming longer still, but two 101 important things happened ten years ago that turned me into an “R 102 programmer”: I started a new job that was going to require more statistical 103 programming than I’d done in academia, and Hadley Wickham was hard at work 104 on a new R package called dplyr (which was to become the centerpiece of 105 a family of packages collectively called the Tidyverse[^hadley]). 106 107 I used R before 2014, but I went to tremendous lengths to avoid actually 108 programming it. Instead, I would do all of my data wrangling on Python (in 109 version 2, which was the style at the time) and then load “tidy data” into 110 R to perform T-tests and ANOVA. In my experiments with R as a programming 111 language, I found its native interface for manipulating data 112 frames[^data-frame] (now frequently called “base R” to distinguish it from 113 Tidyverse-dependent approaches) to be clunky and unintuitive. The Tidyverse 114 changed all that; dplyr introduced a suite of “pure” 115 functions[^pure-function] for data transformation. They had easy-to-remember 116 names (all verbs, since they performed actions on data frames), consistent 117 argument ordering, and were designed to work well with [the forward pipe 118 operator from the magrittr package]({{< ref "r-pipe-equality" >}}). 119 120 Data wrangling in the Tidyverse just _feels_ different (and better) than 121 working with its predecessors. While doing a live coding exercise as 122 I interviewed for a previous job, one of my then-future-colleagues—a 123 die-hard Python user—commented on how “fluid” programming in the Tidyverse 124 seemed. Compared to the syntax of Pandas, a Python data frame module that 125 provides an interface not too different from base R’s, it’s a fundamentally 126 different beast. 127 128 That stuff about metaphors and abstractions is relevant here, because these 129 explain why the Tidyverse feels different. It operates on a different level 130 of abstraction than base R’s data frame operations; i.e., it depends on 131 a different mental model of the underlying data structures. Just to be 132 clear: its advantages do come at some cost, and not everybody agrees that 133 these trade-offs are justified. But based on the popularity of the 134 Tidyverse, I am not alone in thinking they are. Almost everything we do on 135 computers follows this pattern. Writing our data analyses in R and Python is 136 much easier than using a “low-level” language like C, but this additional 137 layer of abstraction can make our programs slower and less memory-efficient. 138 For that matter, carefully optimized assembly code can outperform C, and 139 I haven’t met anybody who analyzes data using assembly. Programming 140 languages (and paradigms, libraries, frameworks, etc.) proliferate because 141 they solve different problems, generally by working at different levels of 142 abstraction. 143 144 LLMCAs also introduce trade-offs: for example, programmers can generate code 145 more quickly, but they don’t understand it as deeply as if they had written 146 it themselves. Rather than simply argue about when (if ever) this trade-off 147 is worth making, I invite you to imagine that Copilot had come to R before 148 the Tidyverse had. Instead of getting an interface that allows data 149 scientists to work faster by operating at a more comfortable level of 150 abstraction, we’d be writing base R at faster rates using its suggestions. 151 Both approaches result in programming tasks being finished more quickly. 152 However, the programmer using the Tidyverse knows exactly why and how their 153 code works (at least at one level of abstraction) and enjoyed the time they 154 spent on it. The programmer using Copilot would only have a sketchy sense 155 that their code seems to work, and they probably didn’t have much fun 156 getting there. This is why I fundamentally oppose LLMCAs: the challenges 157 that drive programmers to use them would be better solved with their own 158 “Tidyverses”. 159 160 From a business perspective, it might seem less risky to rent access to 161 LLMCAs than invest in the development of new tooling, but this is a mistake. 162 The tools may be _relatively_ inexpensive to use now, but that’s bound to 163 change eventually. The cost of building and deploying a new LLMCA ensures 164 that only a few Big Tech players can compete, and these companies have track 165 record of collusion[^collusion]. I also find that many hiring managers 166 underestimate how much more productive their workers can be when given 167 a challenging but fun task than when asked to do something “easier” that’s 168 boring. 169 170 I’m no business guy, my call to action is directed primarily to my fellow 171 “senior” developers. Don’t evangelize for LLMCAs—instead push harder for the 172 resources to develop better tooling. If you currently use LLMCAs yourself, 173 identify the types of tasks that benefit the most from them, and note these 174 as spaces in need of creative solutions. Encourage junior programmers to 175 develop a deeper understanding of the tools they use currently, and insist 176 that your colleagues at all levels imagine something better. 177 178 [^science]: For its part, science can be a great tool for exposing the 179 limitations of these mental models. But at the end of the day, it’s 180 still only producing different, hopefully better models, operating at 181 specific levels of abstraction. 182 183 [^comprehend]: I invite any extremely hardcore coder who scoffs at my claim 184 that computers are difficult to comprehend to reflect on the last time 185 they were required to think about the physics of transistors or the 186 capacitance of circuit traces when they were programming their web app 187 or whatever. 188 189 [^ai]: Like many LLM-based technologies, these are currently being marketed 190 as “AI”. There’s no reason to believe that these machine learning 191 technologies will bring us closer to “general” artificial intelligence. 192 193 [^ethics]: Somebody may argue that this sort of use was permitted by 194 GitHub’s Terms of Service, but there are two flaws in this argument. 195 First, the people posting code to GitHub are not necessarily the code’s 196 authors; plenty of projects have been “mirrored” there by people who 197 were only hoping to make them more accessible. The more glaring error in 198 this argument is that it commits the cardinal sin of mistaking “not 199 illegal” for “ethical”. Plenty of atrocious behaviors have been 200 perfectly legal. Stealing people’s computer code is certainly not in the 201 same class of behavior as slavery and genocide, but I still learned not 202 to assume that those who are quick to point to a Terms of Service are 203 taking ethical issues seriously. 204 205 [^climate]: [ChatGPT alone is already consuming the energy of 33,000 206 homes](https://www.nature.com/articles/d41586-024-00478-x). 207 208 [^illegal]: I say “probably” because the courts have yet to rule on the 209 matter. At least [one lawsuit](https://githubcopilotlitigation.com/) has 210 already been filed, but I can’t say that I’m particularly optimistic 211 that the courts would rule in favor of individual hobbyists against the 212 interests of some of the wealthiest individuals and corporations in the 213 world. 214 215 [^kids]: Just to be clear, this isn’t a “kids these days” rant about the 216 skills of junior programmers, if anybody deserves the blame here it’s 217 the managers and senior programmers who allowed this rotten situation to 218 fester. 219 220 [^menge]: Menge, E. J. v. K. (1930). Biological problems and opinions. _The 221 Quarterly Review of Biology, 5_(3), 348-359. 222 223 [^unlucky]: And also the doubly-unlucky coders who are neither allowed to 224 use LLMCAs nor given the resources and opportunity to be clever and 225 creative. 226 227 [^hadley]: There was a brief period where this was colloquially called “the 228 Hadleyverse” in homage to Wickham, but he insisted on the new name. 229 230 [^data-frame]: A “data frame” is another abstraction, used by many 231 programming languages or libraries use to represent observations about 232 distinguishable units; it uses the same metaphor of rows and columns 233 that spreadsheet programs like Microsoft Excel use. 234 235 [^pure-function]: A “pure function” is one that “doesn’t have side effects”. 236 In slightly plainer English, a pure function doesn’t do anything other 237 than (potentially) return a new thing. Object methods that update object 238 attributes aren’t pure functions, nor are any functions that modify 239 their arguments or global variables. 240 241 [^collusion]: For instance, several Big Tech companies [recently reached 242 a settlement](https://www.npr.org/sections/thetwo-way/2014/04/24/306592297/tech-giants-settle-wage-fixing-lawsuit) 243 with workers alleging that they had engaged in illegal wage-fixing.