The Data Age
Ken Tapping, 21th September, 2016
A few years ago I was in Ottawa, discussing with an NRC colleague the problems of finding information in large piles of data. Raw data is something like iron ore: not really much use in itself, but the iron we extract from it is very useful. What we try to extract from data is useful information.
The search we were discussing in Ottawa was something like looking in a haystack for things that are not hay, but may look a lot like it. That is, searching for things that do not really fit in with the rest, without knowing beforehand what they are or how they differ. This involves computing techniques such as “data mining”, “neural networks” and “machine learning” – computer aided data analysis (CADA).
In one example, my colleague had taken observations of about 22,000 galaxies and simply asked the computer to divide them into classes, without putting in any information about galaxies or what was being sought. It took a while but the computer did it, and produced a wealth of new astronomical information about those galaxies.
In our project the objective was to find techniques for looking at ice cores, tree rings and sunspot data to put together a very long-term record of the Sun’s behaviour without telling the machine about any particular things we were looking for. In science, as in lots of other human activities, history is filled with examples of missed discoveries because “we knew what we were looking for”.
This is becoming a big issue in astronomy. The amount of data collected in an astronomical observation used to be some notes, a few images, and maybe some spectra. This was usually stored on magnetic tape for us to analyze when we got back home. Modern optical and radio telescopes are not like that. For example the CHIME (Canadian Hydrogen Intensity Mapping Experiment) radio telescope, now well into construction here at DRAO, is capable of imaging a good fraction of the visible sky in a single operation, observing thousands of objects. The result is that when it starts observing, a torrent of data will be flowing out. The Square Kilometre Array, an international project (including Canada) to build the largest and most sensitive radio telescope ever developed, will produce a tsunami. There will be so much data we won’t be able to store all of it. Somehow we will need initially to process the data in order to reduce the amount without losing scientifically valuable information; in many cases without knowing beforehand what that might be. That’s exactly what CADA does for us.
This problem is not confined to astronomy or even just science. The explosive growth in our capabilities in acquiring all sorts of data, storage capacity, computing power and our ability to move huge amounts of data around the world easily and quickly has led to an accumulation so big the word “huge” is inadequate. Almost anything we want to know is probably “out there somewhere”. The “web browsers” and “search engines” most of us use are tools for finding information that someone has already extracted and processed. Searching the raw data is far more difficult, especially when we have only a hazy idea of what we are looking for. That is where CADA comes in, where the computer becomes more of an assistant than just a number cruncher. Thousands of years ago stone was the technological basis for our ancestors’ lives, so we call that time the Stone Age. Today, living amidst enormous piles of data about everything, including us, we live in the Data Age.
Report a problem or mistake on this page
- Date modified: