Horror vacui begone

browser and editor
Mikhail’s qrdcreator in a browser and an editor with a dachs start-produced template.

One of the major usability issues our publishing suite DaCHS has for operators (i.e., people who want publish data) is the “horror vacui”: How do I start a Resource Descriptor (RD – the file DaCHS interprets to build services)?

I used to recommend to start by having a look at the RDs of our existing services and pick whatever matches best your publication project. But finding a matching service and figuring out what is generic, what’s a special property of the concrete data collection, and what’s a hack that should not be reproduced isn’t straightforward at all, not to mention the fact that some of those RDs have been in maintenance mode for almost 10 years and hence may show deprecated practices.

Then came the the VESPA implementation workshop last year, during which Mikhail Minin showed me a piece of javascript and HTML (source on github) he has written to overcome the empty editor window. Essentially, Mikhail has built a fairly comprehensive form interface in a web browser that asks people the right questions to eventually write an RD for EPN-TAP (i.e., solar system) resources.

I had planned to generalise Mikhail’s approach to several types of resources supported by DaCHS, ideally inferring the questions to ask from the built-in documentation of mixins and applys. But during the last year, whenever I felt it would be a good time to tackle that generalisation, I quickly gave up again. It was mostly rather trivial stuff such as how to tell apart repeatable metadata (waveband, say) and non-repeatable metadata (instrument, say). But it was bad enough that I quickly found something else to do each time I got started.

Eventually, I gave up on a menu interface altogether – making it flexible and generatable at the same time seemed a fairly complex problem. But that doesn’t mean I forgot about overcoming the horror vacui thing. So, when forms aren’t flexible enough for data entry, where do you turn? Right! A text editor.

Enter dachs start. That’s a new DaCHS subcommand that gets you started with your RD. For one, you can list the templates available:

$ dachs start list
siap -- Image collections via SIAP1 and TAP
ssap+datalink -- Spectra via SSAP and TAP, going through datalink
epntap -- Solar system data via EPN-TAP 2.0
scs -- Catalogs via SCS and TAP

More templates are planned; siap+datalink, for instance, would cover some frequent use cases. Feel free to mail in requests.

Once you find a suitable template, create your future resource directory, enter it and run dachs start again, this time passing the name of the template you want:

$ mkdir ex_data
$ cd ex_data
$ dachs start scs
$ head -16 q.rd | tail -9
<resource schema="ex_data">
  <meta name="creationDate">2018-04-13T12:34:31Z</meta>

  <meta name="title">%title -- not more than a line%</meta>
  <meta name="description">
    %this should be a paragraph or two (take care to mention salient terms)%
  </meta>
  <!-- Take keywords from 
    http://astrothesaurus.org/thesaurus/hierarchical-browse/

dachs start uses the directory name as the new schema name and then writes a file q.rd (which is the canonical name for the “main” RD in a resource). Within this file, you’ll see things to fill out between pairs of percent signs with short explanantions. Where longer explanations are necessary, embedded comments should help.

To give you an idea of the intended use: As a vim user, I’ve put

augroup rd
  au!
  au BufRead,BufNewFile *.rd imap  /%[^%]*%a
  au BufRead,BufNewFile *.rd imap  cf%
augroup END

into my ~/.vimrc. That way, while editing the template into an actual RD, hitting F8 takes me to the next thing to be edited; I can then read the instructions, and when I have made up my mind, I can either delete the template element or hit F9 and replace the explanation text with whatever belongs there.

The command is available starting with the 1.1.3 beta (available now by switching to the beta repo) and will be part of the 1.2 release, planned for early June after the Victoria interop.

If you have a publication project: just try it out and give feedback. Note that the templates haven’t actually been tested yet, and the comments were written by a DaCHS and VO nerd, so they might not always be great either. Thus, when you get stuck: complain early, complain often!

DaCHS 1.1 released

Today, I have released DaCHS 1.1, with the main selling point that DaCHS should now speak TAP 1.1 (as defined in the current draft).

First off, if you’re not yet on DaCHS 1.0, please read the corresponding release article before upgrading.

As usual, the general upgrading instructions are available in the operator’s guide (in short: do a dachs val ALL before the Debian upgrade). This time, I’d recommend to use the opportunity to upgrade your underlying server to stretch if you haven’t done so already. If you do that, please have a look at hints on postgres upgrades. Stretch comes with postgres 9.6 (jessie: 9.4). Postgres upgrades are generally safe, but please take a dump before migrating anyway.

So, with this out of the way, here’s a short list of the major changes from DaCHS 1.0 to DaCHS 1.1:

    • DaCHS now officially requires python 2.7. If this really is a problem for you, please shout – if wouldn’t be hard to maintain 2.6 compatibility, but by now we feel there’s no reason to bother any more.
    • Now supporting TAP 1.1; in particular, TOP n doesn’t trump MAXREC any more, and it doesn’t affect OVERFLOW indication, which may break things that used TOP to override DaCHS’ default TAP match limit of 2000. Also, TAP_SCHEMA is updated (this happens as a side effect of dachs upgrade).
    • Now serialising spoint, scircle, and friends to DALI 1.1 xtypes (timestamp, point, polygon, circle). Fields explicitly marked with adql:POINT or adql:REGION will still be serialised to STC-S. Do this only if you have no choice (DaCHS has this for obscore and epntap s_region right now).
    • The output column selection is sanitised. This may make for slight changes in service responses, in particular in VOTable formats. See Output Tables in the reference documentation for details if you think this might hit you.
    • DaCHS no longer comes with an outdated version pyparsing and instead uses what’s installed on the system. The Debian package further re-uses additional system resources if available (rjsmin, jquery).
    • DaCHS now tries a bit harder to come up with sensible names for SODA result files.
    • map/@source is no longer limited to identifier-like strings; any key that’s in your source is fair game.
    • For incremental imports with data that’s updated now and then, there’s now ignoreSources/@fromdbUpdating.
    • Relative imports from custom code (“import foo” in a custom core, for instance, getting res/foo.py) no longer work. See Importing Modules in the reference documentation for details.
    • This release fixes a severe bug in the creation of obscore metadata from SSAP tables. If you use //obscore#publishSSAPHCD or //obscore#publishSSAPMIXC mixins, update the obscore definitions by running dachs imp -m <rdid>, followed by dachs imp //obscore (the latter is only necessary once at the end).
    • You can now define a footer.html template that’s added at the foot of the main page content – with a bit of CSS magic, this lets you overwrite almost anything on DaCHS HTML pages.

    As always, please complain early if something breaks for you; our regression tests can only cover so much. In particular, our support list is there for you.

    Update (2017-12-06): In particular on jessie, you may
    see that all DaCHS packages are being held back. To resolve this
    situation, manually say apt-get install python-gavoutils<br />
    python-gavostc
    .

Time Series

The IOVA’s committee on science priorities (CSP) has declared the “time domain” as one of its focus topics quite a while ago, an action boiling down to a call to the IVOA member projects to think about support for time series and their analysis in services, standards, and clients.

While for several years, response has been lackluster, work on time series has gathered quite a bit of steam recently. For instance, the spectral client SPLAT (co-maintained by GAVO) has grown some preliminary support to properly display time series (very rudimentary in what’s currently released), and lively discussions on proper metadata for time series have been going on on the Data Models mailing list of the IVOA – if you’re interested in the time domain, this would be a good time to subscribe for a while and comment as appropriate.

Meanwhile, in our Heidelberg data center, we’ve joined the fray by publishing our first time series service (science background: searching for exoplanets in the Milky Way bulge using gravitational lensing), which is available through SSA (look for k2c9vst) and through ObsCore (at http://dc.g-vo.org/tap, collection name k2c9vst), too. For details see also the service info.

Since right now future standards are being worked out, this is a perfect time to publish your time series; this way you get to influence what people will be able to tell machines about their time series in the next couple of years. Ask our staff (contact below) if you want us to publish for you. But you can also self-publish using the DaCHS publication package. Refer to the resource descriptor of the k2c9vst service to get started.

At its heart is the table definition of the time series, which is basically


<table id="instance">
  <column name="hjd" type="double precision"
      unit="d" ucd="time.epoch"
      tablehead="Time"
      description="Time this photometry corresponds to."
      verbLevel="1"/>
  <column name="df" type="double precision"
      unit="adu" ucd="phot.flux"
      tablehead="Diff. Flux"
      description="Difference as defined by 2008MNRAS.386L..77B"
      verbLevel="1"/>
  <column name="e_df"
      unit="adu" ucd="stat.error;phot.flux"
      tablehead="Err. DF"
      description="Error in difference flux."
      verbLevel="15"/>
</table>

– in the actual service, there are a few more columns, but time, value, and error actually make up a full time series.

Except that a machine can’t really tell what this is yet (well, perhaps it could using UCDs, but that’s a different matter). What it needs to work out is what’s the independent axis, what the frames are, etc. And to do that, the machine needs annotation, i.e., machine-readable, structured declarations alongside the data and the “classic” metadata like units and descriptions.

In actual VOTables, this will be happening through VO-DML annotation, which is also still seriously being discussed; whatever we currently spit out you can inspect in the XML source of this example document.

DaCHS, however, isolates you from the concrete details of writing VOTables. Instead, you write annotations in a JSON-inspired little language we’ve christened SIL (“Simple Instance Language”; reference). The complicated part is to know what types and attributes you have to declare, which is exactly what the data models is a bout. As said initially, the details are still in flux here, but this is what things look like right now:


<dm>
  (ivoa:Measurement) {
    value: @df
    statError: @e_df
  }
</dm>

<dm>
  (stc2:Coords) {
    time: (stc2:Coord) {
      frame:
        (stc2:TimeFrame) {
          timescale: UTC
          refPosition: BARYCENTER 
          kind: JD }
      loc: @hjd
    }
    space: 
      (stc2:Coord) {
        frame:
          (stc2:SpaceFrame) {
            orientation: ICRS
            epoch: "J2000.0"
          }
        loc: [@raj2000 @dej2000]
    }
  }
</dm>

<dm>
  (ndcube:Cube) {
    independent_axes: [@hjd]
    dependent_axes: [@df @mag]
  }
</dm>

If you consider this for a moment, you’ll see that each dm element corresponds to something like an object template of a certain “type”. The first, for instance, defines a measurement with a value and a statistical error. Both happen to be given as references to columns in the table defined above (as indicated by the @ signs).

The last annotation defines a data cube; a time series in this definition is simply a data cube with just a single non-degenerate independently varying axis (the independent_axis attribute; in the value the square brackets indicate a sequence) that happens to be time-like. And that hjd is time-like, VO-DML enabled clients will work out when interpreting the STC (“Space-Time-Coordinates”) annotation. In there, you will see that hjd is referenced from the time attribute and with a time-like frame that also defines that this particular flavor of HJD is what a hypothetical clock at the solar system’s barycenter would measure if it stood in the gravitational potential in Greenwhich, and had leap seconds thrown in now and then. And that long story is communicated through “literals”, constant strings like “BARYCENTER” or ”TT”, which are also legal within DaCHS data model annotations.

This may seem a bit complicated at first. I argue, though, that given what time series clients will have to do anyway, going through the cube and STC annotations is actually about the most straightforward thing you can do.

But perhaps I’m wrong, so again: None of this is cast in stone right now. Comments are even more welcome than usual, either below or at gavo@ari.uni-heidelberg.de.