Physics meets literature in unique UNL collaboration
In what can be viewed as an example of extreme cross-disciplinary collaboration, Matthew Jockers, an associate professor of English at the University of Nebraska-Lincoln, has been consulting with UNL physicist Aaron Dominguez for the past several years in his quest to quantify narrative structure.
Jockers is a renowned digital humanist and a fellow with UNL’s Center for Digital Research in the Humanities. He specializes in text mining, using high-powered computing to identify thematic and stylistic patterns among thousands of literary works. Dominguez is part of a global team of physicists who helped identify the Higgs boson. His work typically involves interpreting large amounts of data generated by the Large Hadron Collider at the CERN particle-physics laboratory in Switzerland.
As part of “sentiment analysis” work begun several years ago, Jockers realized that a novel’s emotional content mirrors its plot development. A sentence-by-sentence analysis of a work’s emotional overtones could give researchers a groundbreaking tool to understand how narrative is shaped.
Savvy writers have long recognized that a story’s structure can heighten or lessen its impact and meaning. Soviet folklorist Vladimir Propp described the distinction between a story’s “syuzhet,” or organization, and its “fabula,” the raw elements of the tale. Author Kurt Vonnegut hypothetized that story structures could be drawn out on graph paper or fed into a computer.
“The shape of a given society’s stories is at least as interesting as the shape of its pots or its spearheads,” Vonnegut said while studying anthropology at the University of Chicago post World War II.
By charting the emotional valences of books such as “Portrait of the Artist as a Young Man” and “The Picture of Dorian Gray,” Jockers noticed that the works’ emotional highs and lows matched what he considered to be the plot structure.
“By accident, I discovered that sentiment could be used as a highly accurate proxy for plot movement,” he said. “Discovering the accuracy of these graphs was quite thrilling.”
Jockers had “emotional valence” data for more than 42,000 novels. But he was stymied when it came discerning patterns within the data: How does one compare the shape of a nearly 2,000-page novel like “Moby Dick” with one that is fewer than 200 pages, like “The Picture of Dorian Gray?”
That’s where Dominguez came in.
He and Jockers became friends after Jockers came to UNL in 2012. Dominguez, who was working in Switzerland at the time, temporarily rented his Lincoln house to Jockers.
After Dominguez returned to Lincoln, the two began meeting for coffee and discussing their work. Jockers confided his struggle to find a way to mathematically and computationally compare different stories to Dominguez, who suggested that Jockers try using the Fourier transform.
It was a near magical moment, Jockers said.
“Over coffee one afternoon, this physicist, Aaron Dominguez, helped me figure out how to travel through narrative time,” he marveled.
The Fourier transform is a mathematical formula used to detect patterns in large amounts of data. It’s a way to decompose a time-based signal and reconstitute it as a frequency. A motivated physics student like Dominguez might begin working with the formula as an undergraduate – but an English major might never be exposed to it.
Jockers spent about 40 hours learning the math. Dominguez helped him by doing a small test, a proof of principle using about 50 books.
After creating an R language software package, which he named “Syuzhet” in honor of Propp, Jockers began clustering books according to the similarity of their shapes. The result of one experiment was that the books all seemed to cluster into a finite set of six or seven common shapes.
Jockers said there is still more validation to be done, but already the work has received worldwide interest. He will be discussing his findings in an appearance at Harvard University next month and is preparing a longer essay describing his work for an upcoming book-length collection of articles about digital literary analysis. In the meantime, Jockers offers details about his method on his blog.
On a recent afternoon in Lincoln, he and Dominguez continued to brainstorm on the project.
“English and physics is a pretty big gulf – but new things happen on the boundaries like that,” Dominguez said.