ARCHIVED - Data processing drives genetic research

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

September 16, 2009— Ottawa, Ontario

Canola breeding and cancer research may seem to be miles apart, but scientists at the NRC Institute for Information Technology (NRC-IIT) in Ottawa use the same mathematical methods to uncover valuable knowledge about both. 

The institute is a partner in two multidisciplinary projects from NRC's Genomics and Health Initiative that intertwine mathematics and biology like the double helices of a DNA chain. 

In both projects, NRC researchers seek out complex patterns in huge amounts of data from many different sources, then use the patterns to develop mathematical models that accurately describe or predict how proteins and genes in living cells will behave. Other researchers - in biology, medicine and genetics - take that information to drive their own research forward. 

Cancer and canola 

The cancer project, using NRC-IIT's information technology to seek ways to slow or stop cancerous cell growth in breasts and brains, is led by the NRC Biotechnology Research Institute in Montreal. The project has the potential to reduce cancer deaths and improve quality of life for patients with the disease. 

The canola project, which seeks to improve Canadian varieties for hardiness and better yields, is a partnership between NRC-IIT and the NRC Plant Biotechnology Institute in Saskatoon. Canadian farmers value canola as a cash crop, and consumers choose it as a healthy edible oil source that is low in saturated and trans fats. 

"The domains are definitely different, but in terms of our expertise, we are using very similar methods" says Dr. Fazel Famili, who leads NRC-IIT's part in both projects. "The commonalties are in the techniques that you apply. In layman's terms, you have some tools, and whether you have a car that is imported or made locally, it doesn't matter. The same tools should work, providing that you know how to use them." 

One of NRC-IITs areas of expertise is data mining, which describes a mathematical process to discover previously unknown knowledge hidden in huge amounts of information. Another is machine learning - shorthand for computers programmed to efficiently crunch through huge masses of historical and online data, sometimes without human guidance, searching for patterns. NRC software reports the patterns to handlers, but also uses them to reprogram itself to search even more efficiently. It can rifle through information from many sources - for example, text records that describe a tumour sample's source, type, and how it has developed - then link all of these to proteins that researchers have found in the sample. 

"Machine learning doesn't care where the data come from, whether it's the biology of cancer or the biology of plants," says Dr. Famili. "A cell is a cell, whether it belongs to a plant, a human being or another species of organism." 

Dr. Fazel Famili leads NRC-IIT’s contribution to canola breeding and cancer research projects.

Dr. Fazel Famili leads NRC-IIT’s contribution to canola breeding and cancer research projects.

Reducing trial and error 

Canola and other plant breeding have long involved trial and error, and growing generations of test crops. Seeds from the best plants were replanted, or two different varieties were cross-pollinated, to gain better-yielding or more disease- and weather-resistant plants. 

NRC's mathematical approach helps breeders examine huge numbers of existing historical samples to quickly find and take advantage of desirable genetic differences. This approach also reduces some of the guesswork involved in breeding canola to grow faster and yield more oil. NRC-IIT will comb through and organize the historical and online data to help canola breeders to do this effectively. The initial sorting and organizing takes considerable computing power. But once it's done, the resulting plant-breeding database will likely be stored and studied on ordinary desktop PCs. 

The cancer project is focusing first on breast and brain cancers, then may extend to other types. NRC-IIT is scrutinizing massive amounts of information for particular genes that control cancer in different ways. Examining active genes and proteins for important patterns can tell medical researchers how cancers develop, and suggest new ways to treat them.  

Dr. Charles-Antoine Gauthier, acting director of research programs at NRC-IIT, says that a new generation of cancer treatments could focus on managing and living with the disease. To make such treatments practical, clinicians need to identify cancers early, and to better understand how cancer cells grow and trick human bodies into supporting tumours. 

"You can't sift through this amount of data by hand," he says. "There's just too much of it. We're talking thousands upon thousands of samples from historical data."

Enquiries: Media relations
National Research Council of Canada

Stay connected


Date modified: