Research at the University of Nebraska–Lincoln | 2014-2015 Report Menu
Soh_Lorang_OR15_009-b_1
Elizabeth Lorang and Leen-Kiat Soh

Today, exposure to poetry often ends with graduation. But in the 19th century, poetry was broadly popular, appearing prominently in newspapers and influencing American society.

The difficulty of culling millions of poems from historic newspapers has left a gap in this important aspect of historical research.

To recover the poetry, a collaboration between UNL Libraries and computer science is developing a unique indexing and retrieval method based on visual cues rather than text. The technique may open new possibilities in searching for interesting patterns in other large datasets.

“With the current methods of searching these historic newspapers, there’s no good way to get at the poetic content,” said Elizabeth Lorang, research assistant professor with the Center for Digital Research in the Humanities. “By narrowing down the millions of poems that were published to those by just a handful of authors, we skew our understanding of how poetry functioned for everyday people.”

In browsing historic newspapers, Lorang realized she was finding poems by cueing on visual distinctions on the page, such as white space and formatting. She teamed with computer scientist Leen-Kiat Soh to see if a computer could be trained to find poems within the millions of newspaper pages digitized for Chronicling America, a Library of Congress and National Endowment for the Humanities project.

The pair, working with undergraduate students, developed a program that does just that, proving that image-based analysis works well. The method addresses challenges of using text-based approaches on historic documents, such as illegible text.

“We’re modeling how human vision finds things,” Soh said. “Even if you don’t understand the language or can’t read the text, you can still find the poem visually, which should help us archive and search texts more efficiently.”

The technique could be used for other newspaper items, such as sports scores and advertising, as well as on modern digital archives.

Lorang and Soh plan to create a database of poems linked to Chronicling America.

The NEH funds their research.


Associated Media

Examples of newspapers scanned for poetry using UNL-developed computer program.

Examples of newspapers scanned for poetry using UNL-developed computer program.

Examples of visual patterns in digitized pages.

Examples of visual patterns in digitized pages.