www.eamoncaddigan.net

Content and configuration for https://www.eamoncaddigan.net
git clone https://git.eamoncaddigan.net/www.eamoncaddigan.net.git
Log | Files | Refs | Submodules | README

index.md (6203B)


      1 ---
      2 title: "Reproducible research fail"
      3 description: One of the mistakes I made in data collection.
      4 date: 2015-10-01
      5 categories:
      6 - Data Science
      7 - Science
      8 tags:
      9 - Psychology
     10 ---
     11 
     12 In most of the psychology subdisciplines under the umbrella of “cognitive
     13 psychology” (e.g., language, memory, perception, etc.), researchers use
     14 programs to collect data from participants (c.f. social psychology, which often
     15 uses surveys instead). These are usually simple programs that display words or
     16 pictures and record responses; if you’ve ever taken an introductory psychology
     17 course, you were surely made to sit through a few of these. Although there are
     18 a few tools that allow psychologists to create experiments without writing
     19 their own code, most of us (at least in the departments with which I’ve been
     20 affiliated) program their own studies. 
     21 
     22 The majority of psych grad students start their Ph.D.s with little programming
     23 experience, so it’s not surprising that coding errors sometimes affect
     24 research. As a first year grad student who’d previously worked as a programmer,
     25 I was determined to do better. Naturally, I made a ton of mistakes, and I want
     26 to talk about one of them: a five-year-old mistake I’m dealing with *today*. 
     27 
     28 Like many mistakes, I made this one while trying to avoid another. I noticed
     29 that it was common for experiments to fail to record details about trials which
     30 later turned out to be important. For instance, a researcher could run a
     31 [visual search](http://www.scholarpedia.org/article/Visual_search) experiment
     32 and not save the locations and identities of the randomly-selected
     33 “distractors”, but later be unable to see if there was an effect of
     34 [crowding](http://www.scholarpedia.org/article/Visual_search#Spatial_layout.2C_density.2C_crowding).
     35 It was fairly common to fail to record response times while looking for an
     36 effect on task accuracy, but then be unable to show that the observed effect
     37 was not due to a speed-accuracy tradeoff. 
     38 
     39 I decided that I’d definitely record everything. This wasn’t itself a mistake. 
     40 
     41 Since I program my experiments in Python and using an object-oriented design --
     42 all of the data necessary to display a trial was encapsulated in instances of
     43 the Trial class -- I decided that the best way to save *everything* was to
     44 serialize these objects using Python’s pickle module. This way, if I added
     45 additional members to Trial, I didn’t have to remember to explicitly include
     46 them in the experiment’s output. I also smugly knew that I didn’t have to worry
     47 about rounding errors since everything was stored in machine precision (because
     48 *that* matters). 
     49 
     50 That’s not quite where I went wrong.
     51 
     52 The big mistake was using this approach but failing to follow best practices
     53 for reproducible research. It’s now incredibly difficult to unpickle the data
     54 from my studies because the half dozen modules necessary to run my code have
     55 all been updated since I wrote these programs. I didn’t even record the version
     56 numbers of anything. I’ve had to write a bunch of hacks and manually install a
     57 few old modules just to get my data back.
     58 
     59 Today it’s a lot easier to do things the right way. If you’re programming in
     60 Python, you can use the [Anaconda
     61 distribution](https://www.continuum.io/why-anaconda) to create environments
     62 that keep their own copies of your code’s dependencies. These won’t get updated
     63 with the rest of the system, so you should be able to go back and run things
     64 later. A language-agnostic approach could utilize [Docker
     65 images](https://www.docker.com/), or go a step further and run each experiment
     66 in its own virtual machine (although care should be taken to ensure adequate
     67 system performance). 
     68 
     69 I do feel like I took things too far by pickling my Python objects. Even if I
     70 had used anaconda, I’d have been committing myself to either performing all my
     71 analyses in Python, or performing the intermediate step of writing a script to
     72 export my output (giving myself another chance to introduce a coding error).
     73 Using a generic output file format (e.g., a simple CSV file) affords more
     74 flexibility in choosing analysis tools, and also better supports data-sharing. 
     75 
     76 I still think it’s important to record “everything”, but there are better ways
     77 to do it. An approach I began to use later was to write separate programs for
     78 generating trials and displaying them. The first program handles
     79 counterbalancing and all the logic supporting randomness; it then creates a CSV
     80 for each participant. The second program simply reads these CSVs and dutifully
     81 displays trials based *only* on the information they contain, ensuring that no
     82 aspect of a trial (e.g., the color of a distractor item) could be forgotten. 
     83 
     84 The display program records responses from participants and combines them with
     85 trial info to create simple output files for analysis. To further protect
     86 against data loss, it also records, with timestamps, a simple log of every
     87 event that occurs during the experiment. The log file includes the experiment
     88 start time, keypresses and input events, changes to the display, and anything
     89 else that could happen. Between the input CSVs and this log file, it’s possible
     90 to recreate exactly what happened during the course of the study -- even if the
     91 information wasn’t in the “simple” output files. I make sure that the output is
     92 written to disk frequently to ensure that nothing is lost in case of a system
     93 crash. This approach also makes it easy to restart at a specific point, which
     94 is useful for long studies and projects using fMRI (the scanner makes it easy
     95 to have false-starts). 
     96 
     97 My position at the FAA doesn’t involve programming a lot of studies. We do most
     98 of our work on fairly complicated simulator configurations (I have yet to do a
     99 study that didn’t include a diagram describing the servers involved), and there
    100 are a lot of good programmers around who are here specifically to keep these
    101 running. I hope this lesson is useful for anybody else who might be collecting
    102 data from people, whether it’s in the context of a psychology study or user
    103 testing.
    104