Nebraska tool boosts gene analysis

Bioinformatics

Tiffany Lee, March 16, 2017 | View original publication

Nebraska tool boosts gene analysis

An apparently random jumble of 21 letters might seem more like the byproduct of falling asleep at a keyboard than a life-balancing component of humans and other complex organisms.

But each letter of the sequence represents one of four molecular links, or nucleotides, in a chain that can slice apart viruses and keep the protein factories of a cell from churning out too many of a particular model.

Its critical functions have made micro-ribonucleic acid, or microRNA, the subject of research interest since its discovery in the 1990s. More recently, researchers including those at the University of Nebraska-Lincoln have found evidence that microRNAs can migrate between organisms and possibly affect genes in their new homes, particularly when hitching a ride in cow-produced milk.

One of those researchers, Juan Cui (joo-AN’ T-zoo-AY’), has developed the first automated tool for distinguishing between native and migratory microRNA in humans and other species. Dubbed MicroRNA Discovery, the web-based platform gives researchers worldwide access to a 24-node computing cluster that efficiently analyzes the vast numbers of nucleotide sequences found in a sample of blood or urine.

“Since we know this biological problem very well – we are also, from a user perspective, doing the same research – we know exactly what sort of data we should pay attention to,” said Cui, assistant professor of computer science and engineering. “We know what we should include in the pipeline and what types of problems people have.”

Much like existing tools, MicroRNA Discovery can identify the presence of native microRNAs by comparing short, randomly fragmented sequences of nucleotides against the known catalogue of human-produced microRNAs. Yet it can also compare the sequences of a human, chimpanzee, dog, rat or mouse against the complete genomes of eight dietary sources – including cows, pigs, chickens, corn and rice – to help assess whether microRNAs from the latter have worked their way into the former’s bloodstream.

If the tool detects a sequence that matches a known microRNA from the dietary source but not the host, Cui said, it flags that sequence as a likely foreign microRNA. In cases where the host and dietary source naturally produce some of the same microRNAs – as humans and cows do – the researchers can look for spikes in those sequences or examine adjacent nucleotides for subtle but telltale differences.

As for the microRNAs that no one has yet identified? MicroRNA Discovery hunts for those, too. MicroRNAs contain no more than 33 pairs of nucleotides, which helps the tool narrow its search. Before they unfold into their mature form, though, all future microRNAs exist in a hairpin-like configuration. And owing to some laws that govern the formation of microRNAs – including the stability-dictating energy levels of certain configurations – many nucleotide sequences could never have existed in that hairpin structure.

So Cui’s team incorporated those laws into the tool’s analytical framework, enabling it to rule out many candidates and pinpoint others worthy of further investigation. It also evaluates whether the potentially new microRNAs more closely resemble those of the host or the dietary source, essentially laying breadcrumb trails that researchers can follow to determine their origins. That could prove especially important when determining whether irregular levels of microRNAs in the bloodstream are signaling the onset of cancers and other diseases for which they have become promising biomarkers.

Time to crunch

MicroRNA Discovery’s parallel-computing approach allows it to analyze and compare more than 1.5 billion nucleotides in a 12-hour span, according a recent study authored by Cui and her colleagues. The tool also identifies and collapses duplicate sequences into one, substantially reducing its overall data load.

Those factors widen what Cui considers the most restrictive bottleneck of RNA sequencing while setting the tool apart from downloadable analysis packages, she said.

“Sometimes the computational time is really unbearable,” said Cui, who sought advice from the university’s Holland Computing Center when building the 24-node computing cluster. “Most (laboratory researchers) cannot easily set up a parallel-computing environment, so they’re basically just using a single node and running the sequences one by one.

“That’s why we think that the parallel computing has to be there. You don’t wait until one file finishes to do the next. You can actually enable the user to upload 20 (or) 40 files at the same time and then get the results within a few hours.”

Its unique strengths have already encouraged researchers to begin taking advantage of MicroRNA Discovery, Cui said, with researchers from the European Bioinformatics Institute among those using it.

Cui and her co-authors, graduate students Hanyuan Zhang and Bruno Vieira Resende e Silva, detailed MicroRNA Discovery in the journal Briefings in Bioinformatics. The team received funding from the Centers of Biomedical Research Excellence at the National Institutes of Health under grant number 1P20GM104320.


Bioinformatics Biomedical Research Computer Science