index.md - www.eamoncaddigan.net - Content and configuration for https://www.eamoncaddigan.net

index.md (6203B)

1 ---
2 title: "Reproducible research fail"
3 description: One of the mistakes I made in data collection.
4 date: 2015-10-01
5 categories:
6 - Data Science
7 - Science
8 tags:
9 - Psychology
10 ---
11
12 In most of the psychology subdisciplines under the umbrella of “cognitive
13 psychology” (e.g., language, memory, perception, etc.), researchers use
14 programs to collect data from participants (c.f. social psychology, which often
15 uses surveys instead). These are usually simple programs that display words or
16 pictures and record responses; if you’ve ever taken an introductory psychology
17 course, you were surely made to sit through a few of these. Although there are
18 a few tools that allow psychologists to create experiments without writing
19 their own code, most of us (at least in the departments with which I’ve been
20 affiliated) program their own studies.
21
22 The majority of psych grad students start their Ph.D.s with little programming
23 experience, so it’s not surprising that coding errors sometimes affect
24 research. As a first year grad student who’d previously worked as a programmer,
25 I was determined to do better. Naturally, I made a ton of mistakes, and I want
26 to talk about one of them: a five-year-old mistake I’m dealing with *today*.
27
28 Like many mistakes, I made this one while trying to avoid another. I noticed
29 that it was common for experiments to fail to record details about trials which
30 later turned out to be important. For instance, a researcher could run a
31 [visual search](http://www.scholarpedia.org/article/Visual_search) experiment
32 and not save the locations and identities of the randomly-selected
33 “distractors”, but later be unable to see if there was an effect of
34 [crowding](http://www.scholarpedia.org/article/Visual_search#Spatial_layout.2C_density.2C_crowding).
35 It was fairly common to fail to record response times while looking for an
36 effect on task accuracy, but then be unable to show that the observed effect
37 was not due to a speed-accuracy tradeoff.
38
39 I decided that I’d definitely record everything. This wasn’t itself a mistake.
40
41 Since I program my experiments in Python and using an object-oriented design --
42 all of the data necessary to display a trial was encapsulated in instances of
43 the Trial class -- I decided that the best way to save *everything* was to
44 serialize these objects using Python’s pickle module. This way, if I added
45 additional members to Trial, I didn’t have to remember to explicitly include
46 them in the experiment’s output. I also smugly knew that I didn’t have to worry
47 about rounding errors since everything was stored in machine precision (because
48 *that* matters).
49
50 That’s not quite where I went wrong.
51
52 The big mistake was using this approach but failing to follow best practices
53 for reproducible research. It’s now incredibly difficult to unpickle the data
54 from my studies because the half dozen modules necessary to run my code have
55 all been updated since I wrote these programs. I didn’t even record the version
56 numbers of anything. I’ve had to write a bunch of hacks and manually install a
57 few old modules just to get my data back.
58
59 Today it’s a lot easier to do things the right way. If you’re programming in
60 Python, you can use the [Anaconda
61 distribution](https://www.continuum.io/why-anaconda) to create environments
62 that keep their own copies of your code’s dependencies. These won’t get updated
63 with the rest of the system, so you should be able to go back and run things
64 later. A language-agnostic approach could utilize [Docker
65 images](https://www.docker.com/), or go a step further and run each experiment
66 in its own virtual machine (although care should be taken to ensure adequate
67 system performance).
68
69 I do feel like I took things too far by pickling my Python objects. Even if I
70 had used anaconda, I’d have been committing myself to either performing all my
71 analyses in Python, or performing the intermediate step of writing a script to
72 export my output (giving myself another chance to introduce a coding error).
73 Using a generic output file format (e.g., a simple CSV file) affords more
74 flexibility in choosing analysis tools, and also better supports data-sharing.
75
76 I still think it’s important to record “everything”, but there are better ways
77 to do it. An approach I began to use later was to write separate programs for
78 generating trials and displaying them. The first program handles
79 counterbalancing and all the logic supporting randomness; it then creates a CSV
80 for each participant. The second program simply reads these CSVs and dutifully
81 displays trials based *only* on the information they contain, ensuring that no
82 aspect of a trial (e.g., the color of a distractor item) could be forgotten.
83
84 The display program records responses from participants and combines them with
85 trial info to create simple output files for analysis. To further protect
86 against data loss, it also records, with timestamps, a simple log of every
87 event that occurs during the experiment. The log file includes the experiment
88 start time, keypresses and input events, changes to the display, and anything
89 else that could happen. Between the input CSVs and this log file, it’s possible
90 to recreate exactly what happened during the course of the study -- even if the
91 information wasn’t in the “simple” output files. I make sure that the output is
92 written to disk frequently to ensure that nothing is lost in case of a system
93 crash. This approach also makes it easy to restart at a specific point, which
94 is useful for long studies and projects using fMRI (the scanner makes it easy
95 to have false-starts).
96
97 My position at the FAA doesn’t involve programming a lot of studies. We do most
98 of our work on fairly complicated simulator configurations (I have yet to do a
99 study that didn’t include a diagram describing the servers involved), and there
100 are a lot of good programmers around who are here specifically to keep these
101 running. I hope this lesson is useful for anybody else who might be collecting
102 data from people, whether it’s in the context of a psychology study or user
103 testing.
104

	www.eamoncaddigan.net Content and configuration for https://www.eamoncaddigan.net
	git clone https://git.eamoncaddigan.net/www.eamoncaddigan.net.git
	Log \| Files \| Refs \| Submodules \| README