Time Series

2017-05-03 Annotation DaCHS Time series Markus Demleitner

The IOVA's committee on science priorities (CSP) has declared the “time domain” as one of its focus topics quite a while ago, an action boiling down to a call to the IVOA member projects to think about support for time series and their analysis in services, standards, and clients.

While for several years, response has been lackluster, work on time series has gathered quite a bit of steam recently. For instance, the spectral client SPLAT (co-maintained by GAVO) has grown some preliminary support to properly display time series (very rudimentary in what's currently released), and lively discussions on proper metadata for time series have been going on on the Data Models mailing list of the IVOA – if you're interested in the time domain, this would be a good time to subscribe for a while and comment as appropriate.

Meanwhile, in our Heidelberg data center, we've joined the fray by publishing our first time series service (science background: searching for exoplanets in the Milky Way bulge using gravitational lensing), which is available through SSA (look for k2c9vst) and through ObsCore (at http://dc.g-vo.org/tap, collection name k2c9vst), too. For details see also the service info.

Since right now future standards are being worked out, this is a perfect time to publish your time series; this way you get to influence what people will be able to tell machines about their time series in the next couple of years. Ask our staff (contact below) if you want us to publish for you. But you can also self-publish using the DaCHS publication package. Refer to the resource descriptor of the k2c9vst service to get started.

At its heart is the table definition of the time series, which is basically:

<table id="instance">
  <column name="hjd" type="double precision"
      unit="d" ucd="time.epoch"
      tablehead="Time"
      description="Time this photometry corresponds to."
      verbLevel="1"/>
  <column name="df" type="double precision"
      unit="adu" ucd="phot.flux"
      tablehead="Diff. Flux"
      description="Difference as defined by 2008MNRAS.386L..77B"
      verbLevel="1"/>
  <column name="e_df"
      unit="adu" ucd="stat.error;phot.flux"
      tablehead="Err. DF"
      description="Error in difference flux."
      verbLevel="15"/>
</table>

– in the actual service, there are a few more columns, but time, value, and error actually make up a full time series.

Except that a machine can't really tell what this is yet (well, perhaps it could using UCDs, but that's a different matter). What it needs to work out is what's the independent axis, what the frames are, etc. And to do that, the machine needs annotation, i.e., machine-readable, structured declarations alongside the data and the “classic” metadata like units and descriptions.

In actual VOTables, this will be happening through VO-DML annotation, which is also still seriously being discussed; whatever we currently spit out you can inspect in the XML source of this example document.

DaCHS, however, isolates you from the concrete details of writing VOTables. Instead, you write annotations in a JSON-inspired little language we've christened SIL (“Simple Instance Language”; reference). The complicated part is to know what types and attributes you have to declare, which is exactly what the data models is a bout. As said initially, the details are still in flux here, but this is what things look like right now:

<dm>
  (ivoa:Measurement) {
    value: @df
    statError: @e_df
  }
</dm>

<dm>
  (stc2:Coords) {
    time: (stc2:Coord) {
      frame:
        (stc2:TimeFrame) {
          timescale: UTC
          refPosition: BARYCENTER
          kind: JD }
      loc: @hjd
    }
    space:
      (stc2:Coord) {
        frame:
          (stc2:SpaceFrame) {
            orientation: ICRS
            epoch: "J2000.0"
          }
        loc: [@raj2000 @dej2000]
    }
  }
</dm>

<dm>
  (ndcube:Cube) {
    independent_axes: [@hjd]
    dependent_axes: [@df @mag]
  }
</dm>

If you consider this for a moment, you'll see that each dm element corresponds to something like an object template of a certain “type”. The first, for instance, defines a measurement with a value and a statistical error. Both happen to be given as references to columns in the table defined above (as indicated by the @ signs).

The last annotation defines a data cube; a time series in this definition is simply a data cube with just a single non-degenerate independently varying axis (the independent_axis attribute; in the value the square brackets indicate a sequence) that happens to be time-like. And that hjd is time-like, VO-DML enabled clients will work out when interpreting the STC (“Space-Time-Coordinates”) annotation. In there, you will see that hjd is referenced from the time attribute and with a time-like frame that also defines that this particular flavor of HJD is what a hypothetical clock at the solar system's barycenter would measure if it stood in the gravitational potential in Greenwhich, and had leap seconds thrown in now and then. And that long story is communicated through “literals”, constant strings like “BARYCENTER” or ”TT”, which are also legal within DaCHS data model annotations.

This may seem a bit complicated at first. I argue, though, that given what time series clients will have to do anyway, going through the cube and STC annotations is actually about the most straightforward thing you can do.

But perhaps I'm wrong, so again: None of this is cast in stone right now. Comments are even more welcome than usual, either below or at gavo@ari.uni-heidelberg.de.