www.eamoncaddigan.net

Content and configuration for https://www.eamoncaddigan.net
git clone https://git.eamoncaddigan.net/www.eamoncaddigan.net.git
Log | Files | Refs | Submodules | README

index.md (15396B)


      1 ---
      2 title: "Programmers should reject LLM-based coding assistants"
      3 description: "Even without the ethical issues present in the coding assistants
      4 that exist, these tools are fundamentally unfit for the job."
      5 date: 2024-02-22T20:36:52-08:00
      6 lastmod: 2025-03-15T20:38:22-07:00
      7 draft: false
      8 categories:
      9 - Programming
     10 - Data Science
     11 tags:
     12 - R
     13 - LLMs
     14 ---
     15 
     16 The complexity of our world is beyond the limits of human comprehension. In
     17 spite of this, we generally feel like we understand what’s going on around
     18 us. Each of us achieves this feat of self-deception by cobbling together an
     19 assortment of abstractions, mental models, schemas, and metaphors[^science].
     20 When life confronts us with yet another task that demands attention, we
     21 select the most appropriate of these and ignore the universe of details that
     22 are (hopefully) irrelevant. This approach generally works surprisingly well!
     23 
     24 Computers are less complex than the whole world, but they still resist human
     25 comprehension. Computer practitioners once again rely on abstractions, etc.,
     26 in order to muddle their way through things—it’s the only hope we’ve
     27 got[^comprehend]. Programming languages are among the best tools in our
     28 arsenal, allowing us to transform human-written source code (which is a sort
     29 of mashup of human language and mathematical notation—another good tool for
     30 approximating the world) into the list of numbers comprising a CPU’s
     31 instructions. Sometimes our programs even work.
     32 
     33 Some people truly love programming for its own sake, but there aren’t enough
     34 of them to fill all the jobs that require doing so. Further complicating
     35 matters, even these folks only _really_ like writing certain kinds of code,
     36 which generally represents a minority of the code employers need. When taken
     37 together, these observations imply that most code is written begrudgingly—it
     38 is not exactly [contributing to self-discovery or spiritual
     39 growth](https://codeberg.org/oneirophage/practice-guide-for-computer/src/branch/main/guide.pdf).
     40 
     41 This is probably one reason that large language model-based coding
     42 assistants (LLMCAs) are becoming popular with some programmers. The most
     43 well-known of these tools is GitHub Copilot, developed by Microsoft and
     44 OpenAI[^ai]. LLMs work by learning representations of language (including,
     45 in the case of LLM-based coding assistants, programming languages) that
     46 result in good performance at predicting the next token in a sequence. For
     47 a programmer using a LLMCA to help with their work, they experience
     48 “auto-complete for code”. In short, they speed up the process of writing
     49 programs, and “writing programs” is the thing that programmers are paid to
     50 do.
     51 
     52 There are ethical issues with the use of the LLMCAs that currently exist.
     53 Copilot specifically was trained on code that was posted to GitHub, and the
     54 authors of this code were not asked for their informed consent to have their
     55 work being used this way[^ethics]. LLM-based models are also particularly
     56 energy intensive, which is something that should concern anybody who cares
     57 about climate change[^climate]. LLMCAs are also probably illegal[^illegal]
     58 as Copilot is known to have violated the licenses of most open source code
     59 posted there. Especially damning is the use of [“copyleft” code]({{< ref
     60 "use-the-gpl" >}}) in its training corpus. Such code was licensed in
     61 a manner that allows for its adaptation and reuse (which is what Copilot is
     62 ultimately doing—adapting and reusing code at scale), but _only_ when the
     63 resulting code is also shared with the same license. Whether or not you’d
     64 like to see the proliferation of Copilot-generated code result in _all_
     65 programs becoming copyleft, I don’t think that’s what its users (or their
     66 employers) intend to have happen.
     67 
     68 But the above issues with LLMCAs are at least solvable in theory. Viz:
     69 a company as well-resourced as Microsoft _could_ train its model using code
     70 that was collected with the authors’ explicit consent, and advances in
     71 energy infrastructure and algorithmic efficiency _might_ bring the climate
     72 impact of coding assistants down into acceptable levels. However, there is
     73 a existential issue with LLMCAs that should inspire programmers to reject
     74 them out of hand: even though they address a real problem, they are the
     75 wrong tool for the job.
     76 
     77 The real problem that LLMCAs attempt to address is that many programmers are
     78 ill-served by the rest of their tooling. I don’t have the personal
     79 experience with web programming to opine on the state of the JavaScript
     80 ecosystem, but there is an emerging recognition that the current status quo
     81 (which starts by reaching for the JavaScript framework du jour, and solves
     82 the problems that arise from using it by bolting on addition dependencies)
     83 is unpleasant and untenable. This approach to developing applications may
     84 generate a lot of code, but it isn’t really _programming_[^kids]; while
     85 bolting together disparate parts is sometimes an appropriate way to build
     86 something, it can’t be the only way we build things. As the early 20th
     87 century biologist  Edward J. v. K. Menge noted[^menge]:
     88 
     89 > Breeding homing pigeons that could cover a given space with ever
     90 > increasing rapidity did not give us the laws of telegraphy, nor did
     91 > breeding faster horses bring us the steam locomotive.
     92 
     93 Sometimes people get the opportunity to apply cleverness and creativity to
     94 find new solutions to problems. This usually starts by taking a step back
     95 and understanding the problem space in a holistic manner, and then finding
     96 a different way to think about it. Coders working with LLMCAs[^unlucky]
     97 won’t be able to do this very often.
     98     
     99 So what does a good solution to this tooling problem look like? Here I’ll
    100 share an example from the R world, since it’s the primary language I’ve
    101 programmed in for the past ten years. I’ve been doing statistics and “data
    102 science” for longer than that, and programming longer still, but two
    103 important things happened ten years ago that turned me into an “R
    104 programmer”: I started a new job that was going to require more statistical
    105 programming than I’d done in academia, and Hadley Wickham was hard at work
    106 on a new R package called dplyr (which was to become the centerpiece of
    107 a family of packages collectively called the Tidyverse[^hadley]).
    108 
    109 I used R before 2014, but I went to tremendous lengths to avoid actually
    110 programming it. Instead, I would do all of my data wrangling on Python (in
    111 version 2, which was the style at the time) and then load “tidy data” into
    112 R to perform T-tests and ANOVA. In my experiments with R as a programming
    113 language, I found its native interface for manipulating data
    114 frames[^data-frame] (now frequently called “base R” to distinguish it from
    115 Tidyverse-dependent approaches) to be clunky and unintuitive. The Tidyverse
    116 changed all that; dplyr introduced a suite of “pure”
    117 functions[^pure-function] for data transformation. They had easy-to-remember
    118 names (all verbs, since they performed actions on data frames), consistent
    119 argument ordering, and were designed to work well with [the forward pipe
    120 operator from the magrittr package]({{< ref "r-pipe-equality" >}}).
    121 
    122 Data wrangling in the Tidyverse just _feels_ different (and better) than
    123 working with its predecessors. While doing a live coding exercise as
    124 I interviewed for a previous job, one of my then-future-colleagues—a
    125 die-hard Python user—commented on how “fluid” programming in the Tidyverse
    126 seemed. Compared to the syntax of Pandas, a Python data frame module that
    127 provides an interface not too different from base R’s, it’s a fundamentally
    128 different beast.
    129 
    130 That stuff about metaphors and abstractions is relevant here, because these
    131 explain why the Tidyverse feels different. It operates on a different level
    132 of abstraction than base R’s data frame operations; i.e., it depends on
    133 a different mental model of the underlying data structures. Just to be
    134 clear: its advantages do come at some cost, and not everybody agrees that
    135 these trade-offs are justified. But based on the popularity of the
    136 Tidyverse, I am not alone in thinking they are. Almost everything we do on
    137 computers follows this pattern. Writing our data analyses in R and Python is
    138 much easier than using a “low-level” language like C, but this additional
    139 layer of abstraction can make our programs slower and less memory-efficient.
    140 For that matter, carefully optimized assembly code can outperform C, and
    141 I haven’t met anybody who analyzes data using assembly. Programming
    142 languages (and paradigms, libraries, frameworks, etc.) proliferate because
    143 they solve different problems, generally by working at different levels of
    144 abstraction.
    145 
    146 LLMCAs also introduce trade-offs: for example, programmers can generate code
    147 more quickly, but they don’t understand it as deeply as if they had written
    148 it themselves. Rather than simply argue about when (if ever) this trade-off
    149 is worth making, I invite you to imagine that Copilot had come to R before
    150 the Tidyverse had. Instead of getting an interface that allows data
    151 scientists to work faster by operating at a more comfortable level of
    152 abstraction, we’d be writing base R at faster rates using its suggestions.
    153 Both approaches result in programming tasks being finished more quickly.
    154 However, the programmer using the Tidyverse knows exactly why and how their
    155 code works (at least at one level of abstraction) and enjoyed the time they
    156 spent on it. The programmer using Copilot would only have a sketchy sense
    157 that their code seems to work, and they probably didn’t have much fun
    158 getting there. This is why I fundamentally oppose LLMCAs: the challenges
    159 that drive programmers to use them would be better solved with their own
    160 “Tidyverses”.
    161 
    162 From a business perspective, it might seem less risky to rent access to
    163 LLMCAs than invest in the development of new tooling, but this is a mistake.
    164 The tools may be _relatively_ inexpensive to use now, but that’s bound to
    165 change eventually. The cost of building and deploying a new LLMCA ensures
    166 that only a few Big Tech players can compete, and these companies have track
    167 record of collusion[^collusion]. I also find that many hiring managers
    168 underestimate how much more productive their workers can be when given
    169 a challenging but fun task than when asked to do something “easier” that’s
    170 boring.
    171 
    172 I’m no business guy, my call to action is directed primarily to my fellow
    173 “senior” developers. Don’t evangelize for LLMCAs—instead push harder for the
    174 resources to develop better tooling. If you currently use LLMCAs yourself,
    175 identify the types of tasks that benefit the most from them, and note these
    176 as spaces in need of creative solutions. Encourage junior programmers to
    177 develop a deeper understanding of the tools they use currently, and insist
    178 that your colleagues at all levels imagine something better.
    179 
    180 ## Update 2025-03-15
    181 
    182 In the year since I wrote the above, I’ve spent time with the LLMCA features
    183 in the Databricks platform, and I indeed find them to be a nuisance which I
    184 have to continually disable. Rather than suggesting the autocompletions I’d
    185 expect from (e.g.) a language server---variables and functions in the current
    186 scope, stuff like that---it suggests non-existent tokens and would (if I let
    187 it) produce broken code. By replacing functionality I expect with something I
    188 don’t want, it’s literally worse than useless, 
    189 
    190 [This blog post from Rob
    191 Bowley](https://blog.robbowley.net/2025/02/01/the-evidence-suggests-ai-coding-assistants-offer-tiny-gains-real-productivity-lies-elsewhere/)
    192 arrives at a similar conclusion, now backed by data that weren’t available
    193 when I started thinking about this stuff.
    194 
    195 [^science]: For its part, science can be a great tool for exposing the
    196     limitations of these mental models. But at the end of the day, it’s
    197     still only producing different, hopefully better models, operating at
    198     specific levels of abstraction.
    199 
    200 [^comprehend]: I invite any extremely hardcore coder who scoffs at my claim
    201     that computers are difficult to comprehend to reflect on the last time
    202     they were required to think about the physics of transistors or the
    203     capacitance of circuit traces when they were programming their web app
    204     or whatever.
    205 
    206 [^ai]: Like many LLM-based technologies, these are currently being marketed
    207     as “AI”. There’s no reason to believe that these machine learning
    208     technologies will bring us closer to “general” artificial intelligence.
    209 
    210 [^ethics]: Somebody may argue that this sort of use was permitted by
    211     GitHub’s Terms of Service, but there are two flaws in this argument.
    212     First, the people posting code to GitHub are not necessarily the code’s
    213     authors; plenty of projects have been “mirrored” there by people who
    214     were only hoping to make them more accessible. The more glaring error in
    215     this argument is that it commits the cardinal sin of mistaking “not
    216     illegal” for “ethical”. Plenty of atrocious behaviors have been
    217     perfectly legal. Stealing people’s computer code is certainly not in the
    218     same class of behavior as slavery and genocide, but I still learned not
    219     to assume that those who are quick to point to a Terms of Service are
    220     taking ethical issues seriously.
    221     
    222 [^climate]: [ChatGPT alone is already consuming the energy of 33,000
    223     homes](https://www.nature.com/articles/d41586-024-00478-x).
    224 
    225 [^illegal]: I say “probably” because the courts have yet to rule on the
    226     matter. At least [one lawsuit](https://githubcopilotlitigation.com/) has
    227     already been filed, but I can’t say that I’m particularly optimistic
    228     that the courts would rule in favor of individual hobbyists against the
    229     interests of some of the wealthiest individuals and corporations in the
    230     world.
    231 
    232 [^kids]: Just to be clear, this isn’t a “kids these days” rant about the
    233     skills of junior programmers, if anybody deserves the blame here it’s
    234     the managers and senior programmers who allowed this rotten situation to
    235     fester.
    236 
    237 [^menge]: Menge, E. J. v. K. (1930). Biological problems and opinions. _The
    238     Quarterly Review of Biology, 5_(3), 348-359.
    239 
    240 [^unlucky]: And also the doubly-unlucky coders who are neither allowed to
    241     use LLMCAs nor given the resources and opportunity to be clever and
    242     creative. 
    243 
    244 [^hadley]: There was a brief period where this was colloquially called “the
    245     Hadleyverse” in homage to Wickham, but he insisted on the new name.
    246 
    247 [^data-frame]: A “data frame” is another abstraction, used by many
    248     programming languages or libraries use to represent observations about
    249     distinguishable units; it uses the same metaphor of rows and columns
    250     that spreadsheet programs like Microsoft Excel use.
    251 
    252 [^pure-function]: A “pure function” is one that “doesn’t have side effects”.
    253     In slightly plainer English, a pure function doesn’t do anything other
    254     than (potentially) return a new thing. Object methods that update object
    255     attributes aren’t pure functions, nor are any functions that modify
    256     their arguments or global variables.
    257 
    258 [^collusion]: For instance, several Big Tech companies [recently reached
    259     a settlement](https://www.npr.org/sections/thetwo-way/2014/04/24/306592297/tech-giants-settle-wage-fixing-lawsuit)
    260     with workers alleging that they had engaged in illegal wage-fixing.