Lyrics, Melodies, and Strong Emotions
Last week a paper that Mitty Ma, Emily Elliott and some of my former LSU colleagues worked on was published at Musicae Scientae.
The short version of the story is that we tried to replicate Ali and Peynircioglu’s first experiment of their paper back in 2006. We also found significant results, but basically everything was in the opposite direction of the original paper.
So if you need a paper to cite in music psychology that demonstrates that even though an experiment/claim can be well cited and not really hold up, you can reference us.
This paper is the quintessential case-study for the type of replication work that was talked about maybe ten years ago when everyone was very fired up about making music science research better.
When this project started in 2019 (?!) we had all the ingredients to get us going…
- a semi-well cited paper within a niche field (as of today, 274 citations)
- an ambitious grad student who wanted to learn how to do better science
- more resources in terms of sample size
- richer literature to help interpret our results
- the ability to use something like the Gold-MSI for measuring musical sophistication
Yet it still was a lot of work and we ran into a few very unexpected barriers.
In the years 2022-2023,
- We had to remove our submission from a pre-print server (fair enough, this is the journal’s policy)
- We had to convince a reviewer that the word replication in our context was warranted, even though we were using a different set of stimuli and population. Check out this paper for an interesting discussion on this if you think replication means “exact replication”.
Which is fine, there is not a standard as to what your reviewers are expected to know. Maybe it just felt more exhausting than normal since we were trying to finish this up during a global pandemic, with all the co-authors being at different institutions, trying to publish a paper where the conclusion is that something didn’t hold up.
With this replication, there was no feeling of novelty or discovery that, when missing, is quite noticeable in terms of inspiring you. I also think it was particularly bad with some of our sleuthing, given the finding that some numbers reported in the paper were detected as not numerically possible according to the GRIM test. (And the numbers that were not possible were the ones associated with some of the reported main effects)
The GRIM test finding could have been due to human error, we will never be able to know since you cannot work backwards to causes from effects in this setting. But that said, a lot of resources went into this project and to emerge from that effort with these “findings” has a very different feeling to it than a paper where you publish something novel.
Lessons
So what lessons have I learned from this? (and now having published 3 replication studies in the past few years, maybe too many…)
-
The “replication as pedagogical tool” teaches a valuable lesson about doing empirical research, but in some ways it can be even more effort than just making a similar experiment that would provide empirical support for a theoretical claim. Don’t get caught up thinking you need to re-run the exact same stimuli, if you do, you might even be contributing to more fixed effects problems.
-
There needs to be much better grounding in theoretical predictions that would say why the empirical data are the way they are. This means thinking harder about what constitutes evidence given the literature that has come before it.
-
I think there needs to be even more discussion about what the replication/generalizability crisis is in our teaching.
-
Lastly, a massive shout out to Mitty for keeping this project alive. Without her continued push for trying to get this published, this little research report would have been just another file drawer study.
The paper is only 12 pages, so a quick read, please email me if you’re having trouble finding a copy of it.