CADC The Canadian Astronomy Data Centre
Herzberg Institute of Astrophysics
Acknowledgments
HST

The Hubble Space Telescope Archive at CADC   (Subscribe to new public observations)

CSA

The HST Calibration at CADC

This page provides information on the processes running at CADC for getting, calibrating and associating the HST data currently at CADC.

Right now, the CADC is serving the HST data from a cache. The HST recalibration is now done in background for each observation datasets. Some special mechanism were put in place to discover when datasets have to be recalibrated. This allow you to get the data right away without having to wait for it. Also, since the CADC is now offering a simple programmatic interface to the data, it is now possible to use the CADC collection directly in your processing scripts. Please read below for more information.

Step 1: Starting with the POD files

The POD files are in a form which are not directly usable by any astronomical software. They are closer to telemetry files. They come directly from STScI and are stored at CADC. Raw files have to be extracted from the POD files in order to start any calibration processes. The HST project chose the start processing from this step since it was the only way to easily rebuild faulty raw dataset headers with a database based system. Users should never be concerned with the POD files.

Step 2: Translating POD files into RAW files: OTFR

In order to start the calibration sequence, the raw files have to be built using the basic POD files as well as doing database query in order to build a conforming header with the latest information and pointers to calibration files. The system used for this operation is distributed by STScI and runs only on Solaris. Users could get the raw files if they desired so after marking the observations for retrieval. All raw files are stored and the cache and are immediately available to users.

Step 3: Calibrating the data set: OTC

After obtaining the raw files from the OFTR, the CADC is then returning to Linux boxes in order to perform the scientific calibration. More than 30 CPUs are dedicated for this task which make refreshing the cache at regular interval possible . To calibrate the data, CADC is using the latest release (often patched) of the STSDAS and TABLES package as distributed by STScI. Users could get the calibrated files if they desired so after marking the observations for retrieval.

HLA: The Hubble Legacy Archive

The HLA, or Hubble Legacy Archive is a consortium of people from the Space Telescope Science Institute, ST-ECF and CADC. Their goal is to optimize the science produced from the Hubble Space Telescope:

  1. Putting the data online for immediate access
  2. Adding a footprint (sky coverage) service to make it easier to browse and download images
  3. Providing more extensive "composite images" (e.g., stacked, color, mosaics)
  4. Improving absolute astrometry from ~1-2" to ~0.3" (see the astrometry FAQ)
  5. Creating source lists
  6. Extracting spectra from ACS and NICMOS grism data (produced at ST-ECF)

The FAQ gives a detailed description of the types of data that are available in the HLA. Currently we have New in DR3 enhanced image products for ACS, WFPC2, and NICMOS, extracted spectra from NICMOS and ACS grism observations, and source lists from ACS and WFPC2 images. The NICMOS imaging, ACS grism spectra, and WFPC2 source lists are all new in the DR3 release. Community-contributed high-level science products such as GOODS, COSMOS and the UDF are also available.

The enhanced data products were generated from the standard HST pipeline products. The ACS and WFPC2 images have been combined using multidrizzle, are aligned north up, and have been astrometrically corrected when possible (for approximately 80% of the cases).

Please note that at this moment, the CADC have only the HLA files for WFPC2. This will change very soon

Step 4: The associations

In addition to delivering "regular" calibrated HST datasets, the CADC offers higher level data products to the astronomical community. The first data release was the WFPC2 associations version 1. CADC has now released version 2 of the WFPC2 associations

The WFPC2 associations

The Canadian Astronomical Data Center (CADC), the Space Telescope European Coordination Facility (ST-ECF) and the Multimission Archive at STScI (MAST) are pleased to make available combined images from the Wide Field Planetary Camera 2 of the Hubble Space Telescope. These combined images are the products of the basic registration and averaging of related sets of WFPC2 images, referred to as associations, that is usually performed by archival researchers after the retrieval of individual images.

The user will find the WFPC2 associations using the main query page for the HST observation at CADC.

At this moment the visible documentation is still quite poor but we are working hard on an improved version. Please check this web site often

The HST cache

What is the cache? The cache is a envelope around HST archive file production. It is a set of database tables and software agents that ensures that all pipeline products are locally available for rapid data retrieval.

Since 2002 has all data from active HST instruments (ACS,STIS,NICMOS,WFPC2) been produced from scratch triggered by user requests. The reasoning behind the On The Fly Reprocessing (OTFR) and On The Fly Calibration (OTFC) pipelines was that it would guarantee that the archive user always would get her data equipped with the newest set of meta-data and calibrated with according to the best methods available. This was a clear advantage to the previous system where the raw data was produced centrally at the STScI and delivered to the partner-sites, essentially freezing that data in time. Another advantage of the system was that it conserved storage space as only the Hubble Space Telescope telemetry files and a few small extra files needed to be stored, an important resource aspect when data is stored on optical disks in jukeboxes. With the advent of cheap mass storage in form of hard-disk arrays this aspect became less important and a number of other drawbacks of the on-the-fly paradigm became apparent over time as well: Live processing of data requires that support is available at all times to resolve errors and bugs in the pipeline, an inevitable task when a system becomes as complex as this with such a heterogeneous set of data as input. Another drawback is the processing speed: Producing a data-set can take from several minutes to hours, which might not be an issue for the patient astronomer, but makes it impossible to expose the data through synchronous VO protocols. Next level efforts like data-mining/metadata harvesting and production of high-level data products is also enormously difficult in the on-the-fly world.

Initial tests were made at the CADC to process and and store the entire HST data collection as a consequence of preview production and generation of higher order science products (WFPC2B? associations, ACS visit asociations). The experiences made were very promissing and highly desireable in a Virtual Observatory context. With the uncertain futute of the european HST archive it became prudent to re-package the archive operations in a ultimatively easy to maintain form, which also is easier to move or freeze if that should become neccesary. Hence the ST-ECF and CADC started a common effort to design, develop and produce a HST archive Cache.

The basic concept of the Cache is that all data is pre-processed and readily available from storage at all times. This includes mechanisms to discover newly observed datasets to insert and automatic re-processing of datasets which benefit from updates to reference files, available meta-data and general processing software upgrades.

Albeit the initial investment may seem high in man- and computingpower and requires additional hardware, does it have a number of clear advantages:

  1. Speed (near instantaneous compared to waiting for hours)
  2. Shield users from errors
  3. Direct programmatic & VO access
  4. Security
  5. Allows sites interoperability
  6. Less maintenance in the long run
  7. transferable (e.g. to ESO or ESAC)
  8. Allows harvesting of meta-data and data-mining
  9. makes the collection easily freezable and transferable

Initial design of the common HST cache started in the second half of 2007 and by spring 2008 it has reached a level of maturity where the initial population of the data holdings can happen. During 2008 the cache will gradually go into operation and serve data to the community.

How is it implemented?

  1. All written object-oriented in Python
  2. Common software library and command line interface at CADC and ECF
  3. All site dependencies have been encapsulated or emulated (e.g. NGAS vs . AD file storage system)
  4. GRID distributed processing via the Sun Grid Engine (http://gridengine.sunsource.net/)

The VO

SIA interface

Proxy access

Proxy access

Proprietary Rights

In accordance with NASA policy, all science data from the Hubble Space Telescope is archived with a one-year proprietary period by default. This period may be extended or shortened at the request of the principle investigator (PI) and on approval by the STScI Director's Office. Calibration data (i.e., data obtained under calibration proposals), by default, carries no proprietary period; neither does engineering data, calibration files (derived from calibration observations), and observatory monitoring data.

In addition to the regular proprietary period, observations by General Observers (GOs) which are found to be duplicates of concurrent observations by a Guaranteed Time Observer (GTO) may be placed under restriction. Data under restriction (or "embargoed") cannot be distributed to the GO until the restriction expires (usually, when the GTO data goes public).

NRC HST