Astronomy and Huge Data

Ken Tapping, October 2, 2018

In the sky this week…

  • Mars is still conspicuous in the southern sky.
  • Saturn lies low in the south and Jupiter very low in the southwest after sunset.
  • The Moon will reach Last Quarter on the 2nd and be New on the 8th.

We now live in an age of Big Data. Once we developed the technologies for handling and storing huge amounts of information, we went on to collect more and more of it. In the same way, astronomy is now in the age of "Huge Data".

Not very long ago making astronomical observations consisted of setting up the telescope and instruments attached to it, pointing it at the object of interest and then manually recording the data. When computers first moved into astronomy, they were used to automate the operation of the telescope and to record data. We took the results away and used computers to analyze them. Then, as computers got smaller, faster and cheaper, the game changed. With computer help, our telescopes could record more data about more things, faster. We can now carry out and process large-scale surveys of the sky, and keep an eye open for transient events. We can make networks of many radio telescopes distributed over thousands of kilometres, processing their outputs digitally to emulate one huge radio telescope. Multitudes of small, high-speed computers now form parts of our instruments, no longer just controlling them. The result is a tsunami of data we have to store, make accessible, and somehow analyze.

One other issue we needed to address is the enormous amount of astronomical information that has accumulated from past observations. Some came from large-scale surveys made at some observatories, and stored there. In addition, sitting in astronomers' offices around the world was data from observations they had made in the past. This led to two serious problems. Firstly, astronomers would propose new observations not knowing that someone else had already made those observations. Secondly, with the rapid evolution of data storage technology, stored data might have become unreadable because nobody has the devices to read it. For example, who these days has the means to read a floppy disc? The solution is to put all the data in special-purpose data centres, where it is archived, backed up and provided in a form that astronomers and other researchers can access as and when they need it. Our national system is called the Canadian Astronomy Data Centre – the CADC.

We have all heard of something out there in our digital world called "the cloud". This rather mystical name refers to a number of huge "server farms": data storage places that hold, archive and generally look after your data and software, and provide additional tools you might need for accessing and working with it. The CADC and other astronomical data centres form a "cloud" for the scientific community. The huge amounts of data coming out of the latest astronomical instruments and our desire to make that data as broadly accessible as possible forces us in that direction. However, having all this data available poses another serious problem. How can we search an enormous number of files and databases for the information we need?

We've all used "search engines" to find information on the Internet. These devices use forms of artificial intelligence: computer programs that emulate certain aspects of the way we search for and assimilate information. In a similar way, we use software assistance to search out what we need from our rapidly growing pile of data we are accumulating about the universe we live in. However, it will be a while before we completely eliminate the need to dig around in the data ourselves, because it is very difficult to program in all the questions we might possibly ask, and research essentially involves asking questions that have never been asked before.

Ken Tapping is an astronomer with the National Research Council's Dominion Radio Astrophysical Observatory.

Telephone: 250-497-2300
Fax: 250-497-2355

Date modified: