Thomas Stoeger of Northwestern University has previously studied scientists’ limited focus on certain genes. In a new study, he shows how these same behaviors extend into the science of COVID-19.

Scientists have identified more than 2,000 human genes linked to COVID-19, yet the bulk of the published literature is dominated by only a small subset of them, a fact that may be limiting progress in the fight against the pandemic.

A team at Northwestern University, led by data scientist Thomas Stoeger, had previously shown that scientists tend to focus on a handful of genes—specifically, less than 20 percent of all genes in the human genome accounted for more than 90 percent of the publications they analyzed. Prior to the Human Genome Project, scientists had an incomplete view of the full suite of human genes and relied more heavily on those that had analogs in model organisms or were easier to study using knockout experiments. The advent of modern sequencing technology—including complementary tools such as CRISPR, mass spectrometry, and RNA-based approaches—has broadened what researchers know, but it seems that scientists are still holding to old patterns.

In a study published today (November 24) in eLife, Stoeger and his Northwestern colleague Luís Amaral looked to COVID-19 research to see if scientists were similarly prioritizing certain genes during the pandemic in case reports and research on mechanisms of infection and transmission, diagnostic tools, and treatments. The pair analyzed 10,395 published papers and preprints and compared the genes studied in those publications against a list of genes linked to the virus through genome-wide association studies (GWAS).

As the pandemic has progressed, they found, scientists have become focused on a small subset of genes to the exclusion of others that may also be important. Of the roughly 2,000 genes identified by the GWAS reports, only 611 were included in the literature they scanned. In particular, three genes, which code [KG1] for angiotensin-converting enzyme 2 (ACE2), a receptor the virus uses to enter cells; C-reactive protein, an inflammation marker; and interleukin 6, a signaling molecule involved in inflammatory responses, accounted for 25 percent of the total research. When they compared these COVID-19 papers against a set of roughly 466,000 non–COVID-19 papers from before 2016, Stoeger and Nunes discovered that whether or not the research related to the pandemic, the same types of genes—those advantageous to experimentation—still command the most attention.

The Scientist spoke with Stoeger about how researchers choose which genes to study, what information they could be missing, and ways that research can open up to previously overlooked genes.

The Scientist: In the paper, you mention this historical bias around the genes that scientists choose to focus on, and you talk about how these choices predate the Human Genome Project. Can you describe how people were selecting genes to study prior to the Human Genome Project?

Thomas Stoeger: Science is difficult, and so scientists start with the least difficult research problems. The research questions scientists worked on before the Human Genome Project tended to be on genes which are interesting and very useful to study, but also happened to be easier to study in a few different ways

One way is that the human genes [they chose] also had related genes in model organisms, such as fruit flies or worms, and the related gene had already been studied. The shortest genes have also been studied much more than others . . . because they’re easier to work with. And there are also some other chemical properties, for instance, the proteins encoded by the genes. When proteins sit on the outside of the cell, for example, it’s easier to access them. All of these things together made experimentation less difficult....


< >

Book Your Seat For Free Career Counselling!