Time Series

The IOVA’s committee on science priorities (CSP) has declared the “time domain” as one of its focus topics quite a while ago, an action boiling down to a call to the IVOA member projects to think about support for time series and their analysis in services, standards, and clients.

While for several years, response has been lackluster, work on time series has gathered quite a bit of steam recently. For instance, the spectral client SPLAT (co-maintained by GAVO) has grown some preliminary support to properly display time series (very rudimentary in what’s currently released), and lively discussions on proper metadata for time series have been going on on the Data Models mailing list of the IVOA – if you’re interested in the time domain, this would be a good time to subscribe for a while and comment as appropriate.

Meanwhile, in our Heidelberg data center, we’ve joined the fray by publishing our first time series service (science background: searching for exoplanets in the Milky Way bulge using gravitational lensing), which is available through SSA (look for k2c9vst) and through ObsCore (at http://dc.g-vo.org/tap, collection name k2c9vst), too. For details see also the service info.

Since right now future standards are being worked out, this is a perfect time to publish your time series; this way you get to influence what people will be able to tell machines about their time series in the next couple of years. Ask our staff (contact below) if you want us to publish for you. But you can also self-publish using the DaCHS publication package. Refer to the resource descriptor of the k2c9vst service to get started.

At its heart is the table definition of the time series, which is basically

<table id="instance">
  <column name="hjd" type="double precision"
      unit="d" ucd="time.epoch"
      description="Time this photometry corresponds to."
  <column name="df" type="double precision"
      unit="adu" ucd="phot.flux"
      tablehead="Diff. Flux"
      description="Difference as defined by 2008MNRAS.386L..77B"
  <column name="e_df"
      unit="adu" ucd="stat.error;phot.flux"
      tablehead="Err. DF"
      description="Error in difference flux."

– in the actual service, there are a few more columns, but time, value, and error actually make up a full time series.

Except that a machine can’t really tell what this is yet (well, perhaps it could using UCDs, but that’s a different matter). What it needs to work out is what’s the independent axis, what the frames are, etc. And to do that, the machine needs annotation, i.e., machine-readable, structured declarations alongside the data and the “classic” metadata like units and descriptions.

In actual VOTables, this will be happening through VO-DML annotation, which is also still seriously being discussed; whatever we currently spit out you can inspect in the XML source of this example document.

DaCHS, however, isolates you from the concrete details of writing VOTables. Instead, you write annotations in a JSON-inspired little language we’ve christened SIL (“Simple Instance Language”; reference). The complicated part is to know what types and attributes you have to declare, which is exactly what the data models is a bout. As said initially, the details are still in flux here, but this is what things look like right now:

  (ivoa:Measurement) {
    value: @df
    statError: @e_df

  (stc2:Coords) {
    time: (stc2:Coord) {
        (stc2:TimeFrame) {
          timescale: UTC
          refPosition: BARYCENTER 
          kind: JD }
      loc: @hjd
      (stc2:Coord) {
          (stc2:SpaceFrame) {
            orientation: ICRS
            epoch: "J2000.0"
        loc: [@raj2000 @dej2000]

  (ndcube:Cube) {
    independent_axes: [@hjd]
    dependent_axes: [@df @mag]

If you consider this for a moment, you’ll see that each dm element corresponds to something like an object template of a certain “type”. The first, for instance, defines a measurement with a value and a statistical error. Both happen to be given as references to columns in the table defined above (as indicated by the @ signs).

The last annotation defines a data cube; a time series in this definition is simply a data cube with just a single non-degenerate independently varying axis (the independent_axis attribute; in the value the square brackets indicate a sequence) that happens to be time-like. And that hjd is time-like, VO-DML enabled clients will work out when interpreting the STC (“Space-Time-Coordinates”) annotation. In there, you will see that hjd is referenced from the time attribute and with a time-like frame that also defines that this particular flavor of HJD is what a hypothetical clock at the solar system’s barycenter would measure if it stood in the gravitational potential in Greenwhich, and had leap seconds thrown in now and then. And that long story is communicated through “literals”, constant strings like “BARYCENTER” or ”TT”, which are also legal within DaCHS data model annotations.

This may seem a bit complicated at first. I argue, though, that given what time series clients will have to do anyway, going through the cube and STC annotations is actually about the most straightforward thing you can do.

But perhaps I’m wrong, so again: None of this is cast in stone right now. Comments are even more welcome than usual, either below or at gavo@ari.uni-heidelberg.de.

Asterics Tech Forum

The 3. Asterics DADI Tech Forum took place last week in Strasbourg – and many GAVO members made contributions as well.
This time, there were 3 slots for hackathon sessions, which were also used for discussions. We’ll mention two highlights of our contributions here.

We took the opportunity to push our Provenance Data Model efforts and used the hackathon slots for provenance discussions.
One topic was the links between the simulation data model and ProvenanceDM, and how to map from SimDM to ProvenanceDM classes. This mapping works quite well and will be included in the working draft for the data model. We also had an interesting talk by José Enrique Ruiz on his view on Provenance, workflows, and – very important – the “deployer” and “system” provenance for storing all the environment variables that may be needed to rerun the processing of some observational data. Michèle Sanguillon also presented for the first time her extension to the prov Python library (W3C) with extensions from our IVOA Provenance Data Model. We also had interested people from outside the usual provenance-interested people joining in, e.g. from the Astron project. More about our Provenance modelling efforts can be found at IVOA Provenance wiki page.

A world premiere (of sorts) was the first discussion of RegTAP 1.1. RegTAP is a search interface to the VO Registry; it is what TOPCAT or other VO clients uses when you type in keywords to locate services. A fairly direct web-basd interface is our WIRR registry interface. RegTAP will need a bit of a makeover since VOResource, the underlying metadata scheme is currently receiving one, allowing, in particular, for including DOIs and ORCIDs (John Does of this world, rejoice: People can finally uniquely find your data and not that of all the other J. Does) in Registry records and figuring out licenses on data. Licensing may not matter when you use data in a paper but it does matter if you want to redistribute data, e.g. for planetarium programs with catalog data or pretty pictures, or when re-mixing data.

But of course the GAVOistas happily joined the fray on the many other topics discussed, from a standard format for a time series to interoperable authentication, from datalink applications to figuring out if data coming into a program should be treated as a collection of spectra or rather an object catalog – the latter in the context of the upcoming version 10 of the VO’s premier image tool Aladin, which we saw (probably another premiere) demoed. We can already promise you an exciting update!

And the Solar System, too

Virtual Observatory technologies are increasingly being adopted outside of “core” astronomy in the vicinity of the optical band (to which they have had, I’ll have to admit, a certain slant) . An excellent example for that trend is the Europlanet community. Their goal is to make solar system data accessible without fiddling, and they are employing a wide range of VO standards for that. At the heart of their efforts are TAP and the VO Registry.

While the usual VO client software will of course work fine with their services, they are offering a nice web-based discovery tool executing queries against an increasing number of services. Such uniform quering over many services is possible is because all of them implement TAP and host EPNcore tables. The resulting interface, also known as EPN-TAP, allows for very flexible discovery and retrieval of solar system data products, much like ObsTAP does for astronomical observations outside of the solar system.

Since quite a few EPN-TAP services are built using GAVO’s DaCHS publication suite, I was invited to this week’s VESPA implementation workshop 2017 in Graz to help the data providers set up their services.

I can’t deny that I’m somewhat excited when I see how our software is used to publish spectra of the ice blocks in Saturn’s ring taken by the lonely Cassini spacecraft still orbiting the gas giant, or data transmitted by Rosetta, now (and for who knows how long) sitting on comet 67P/Churyumov-Gerasimenko. There’s even an upcoming archive of solar system alerts that may, according to its builders, include events like meteor showers on Mars. I can almost hear my code whisper “I’ve archived signals of C-beams glitter in the dark near the Tannhäuser Gate”.

Even documentation can become otherworldly in this business: Already in February, DaCHS has learnt to procude GeoJSON, a format common in the GIS community and also adopted by Europlanet – planetology has lots of common ground with geoinformatics. And in the reference documentation on annotating tables to enable that, when I wrote “standards-compliant GeoJSON clients will interpret your coordinates as WGS84 on Earth if you leave [frame annotation] out”, I was severely tempted to add “which is probably not what you want” and feel like Spaceman Spiff.

Romantic space adventures aside, after this intense week, not only are there several additional or improved EPN-TAP services from places ranging from Pasadena to Villafranca to Warsaw in the pipeline, the close interaction with the data providers has also led to very significant improvements to DaCHS’ EPN-TAP support. The tutorial chapter on EPN-TAP and the reference documentation linked from there already reflect the results of this workshop. You’ll need a current DaCHS beta package for that to work, though; we expect this stuff to go into our release packages around July.

If any of the workshop participants read this: Thanks a lot for your patience with DaCHS’ sometimes somewhat cryptic diagnostics. If, on the other hand, you missed the Graz workshop and have solar system data: Please talk to us or the kind and friendly Europlanet folks – either us will be delighted to support your publication project. And perhaps we’ll meet you at the next such workshop, planned for 2018 in the Czech Republic.

Updated Proper Motion Tutorial

At the risk of turning this into a blog on nice TAP tricks (which it’s not supposed to be): Our classic short tutorial on adding proper motions to almost arbitrary object lists has just gotten a facelift today.

And there’s new content, too – I now show what to do when you don’t even have positions but just object names. In order to keep this sufficiently geeky, here’s the query as a spoiler:

SELECT col1, ra, dec
ON (id=normId(col1))
ON (oidref=oid)

But to close on a non-TAP topic: Registry! There’s an experimental facility to have this kind of thing in the Registry; the PM tutorial is in, for instance, with the ivoid ivo://edu.gavo.org/hd/gavo_addpms). One thing you can do with this is generate a list of registred documents that essentially updates itself from the registry.

Another is figure out where the source code of the document is (if the authors choose to share it, which is of course a very smart thing to do); in our example it’s in Volute, the IVOA’s semi-official version control system. So, if you find a bug (defined as “superset of typo”) in the linked document, you’re most welcome to supply patches as diffs or just directly fix things if you have commit privileges in Volute.

See Who’s Kinking the Sky

A new arrival in the GAVO Data Center is UCAC5, another example of a slew of new catalogs combining pre-existing astrometry with Gaia DR1, just like the HSOY catalog we’ve featured here a couple of weeks back.

That’s a nice opportunity to show how to use ADQL’s JOIN operator for something else than the well-known CONTAINS-type crossmatch. Since both UCAC5 and HSOY reference Gaia DR1, both have, for each object, a notion which element of the Gaia source catalog they correspond to. For HSOY, that’s the gaia_id column, in UCAC5, it’s just source_id. Hence, to compare results from both efforts, all you have to do is to join on source_id=gaia_id (you can save yourself the explicit table references here because the column names are unique to each table.

So, if you want to compare proper motions, all you need to do is to point your favourite TAP client’s interface to http://dc.g-vo.org/tap and run

    in_unit(avg(uc.pmra-hsoy.pmra), 'mas/yr') AS pmradiff, 
    in_unit(avg(uc.pmde-hsoy.pmde), 'mas/yr') AS pmdediff, 
    count(*) as n, 
    ivo_healpix_index (6, raj2000, dej2000) AS hpx 
    FROM hsoy.main AS hsoy 
    JOIN ucac5.main as uc 
    ON (uc.source_id=hsoy.gaia_id) 
    WHERE comp IS NULL    -- hsoy junk filter
    AND clone IS NULL     -- again, hsoy junk filter
    GROUP BY hpx

(see Taylor et al’s All of the Sky if you’re unsure what do make of the healpix/GROUP BY magic).

Of course, the fact that both tables are in the same service helps, but with a bit of upload magic you could do about the same analysis across TAP services.

Just so there’s a colourful image in this post, too, here’s what this query shows for the differences in proper motion in RA:

(equatorial coordinates, and the aux axis is a bit cropped here; try for yourself to see how things look for PM in declination or when plotted in galactic coordinates).

What does this image mean? Well, it means that probably both UCAC5 and HSOY would still putt kinks into the sky if you wait long enough.

In the brightest and darkest points, if you waited 250 years, the coordinate system induced by each catalog on the sky would be off by 1 arcsec with respect to the other (on a sphere, that means there’s kinks somewhere). It may seem amazing that there’s agreement to at least this level between the two catalogs – mind you, 1 arcsec is still more than 100 times smaller than you could see by eye; you’d have to go back to the Mesolithic age to have the slightest chance of spotting the disagreement without serious optical aids. But when Gaia DR2 will come around (hopefully around April 2018), our sky will be more stable even than that.

Of course, both UCAC5 and HSOY are, indirectly, standing on the shoulders of the same giant, namely Hipparcos and Tycho, so the agreement may be less surprising, and we strongly suspect that a similar image will look a whole lot less pleasant when Gaia has straightened out the sky, in particular towards weaker stars.

But still: do you want to bet if UCAC5 or HSOY will turn out to be closer to a non-kinking sky? Let us know. Qualifications („For bright stars…”) are allowed.

Automating TAP queries

TOPCAT is a great tool – in particular, for prototyping and ad-hoc analyses, it’s hard to beat. But say you’ve come up with this great TAP query, and you want to preserve it, perhaps generate plots, then maybe fix them when you (or referees) have spotted problems.

Then it’s time to get acquainted with TOPCAT’s command line sister STILTS. This tool lets you do most of what TOPCAT can, but without user intervention. Since the command lines usually take a bit of twiddling I usually wrap stilts calls with GNU make, so I just need to run something like make fig3.pdf or make fig3.png and make figures out what to do, potentially starting with a query. Call it workflow preservation if you want.

How does it work? Well, of course with a makefile. In that, you’ll first want to define some variables that allow more concise rules later. Here’s how I usually start one:


# VOTables are the results of remote queries, so don't wantonly throw
# them away
.PRECIOUS: %.vot

# in this particular application, it helps to have this global

# A macro that contains common stuff for stilts TAP query -- essentially,
# just add adql=
TAPQUERY=$(STILTS) tapquery \
  tapurl='http://dc.g-vo.org/tap' \
  maxrec=200000000 \
  omode=out \
  out=$@ \
  ofmt=vot \

# A sample plot macro.  Here, we do a healpix plot of some order. Also
# add value_1=<column to plot>
  auxmap=inferno \
  auxvisible=true \
  legend=false \
  omode=out \
  out=$@ \
  projection=aitoff \
  sex=false \
  layer_1=healpix \
    datalevel_1=$(HEALPIX_ORDER) \
    datasys_1=equatorial \
    viewsys_1=equatorial \
    degrade_1=0 \
    combiner_1=sum \
    transparency_1=0 \
    healpix_1=hpx \
    in_1=$< \
    ifmt_1=votable \
    istream_1=true \

For the somewhat scary STILS command lines, see the STILTS documentation or just use your intution (which mostly should give you the right idea what something is for).

The next step is to define some pattern rules; these are a (in the syntax here, GNU) make feature that let you say „to make a file matching the destination pattern when you have one with the source pattern, use the following commands”. You can use a number of variables in the rules, in particular $@ (the current target) and $< (the first prerequisite). These are already used in the definitions of TAPQUERY and HEALPIXPLOT above, so they don’t necessarily turn up here:

# healpix plots from VOTables; these will plot obs
%.png: %.vot
  	value_1=obs \
  	ofmt=png \
  	xpix=600 ypix=380 

%.pdf: %.vot
  	value_1=obs \

# turn SQL statements into VOTables using TAP
%.vot: %.sql
  	adql="`cat $<`"

Careful with cut and paste: The leading whitespace here must be a Tab in rules, not just some blanks (this is probably the single most annoying feature of make. You’ll get used to it.)

What can you do with it? Well, for instance you can write an ADQL query into a file density.sql; say:

  count(*) as obs,
  -- "6" in the next line must match HEALPIX_ORDER in the Makefile
  ivo_healpix_index (6, alphaFloat, deltaFloat) AS hpx
FROM ppmx.data

And with this, you can say

make density.pdf

and get a nice PDF with the plot resulting from that query. Had you just said make density.vot, make would just have executed the query and left the VOTable, e.g., for investigation with TOPCAT, and if you were to type make density.png, you’d get a nice PNG without querying the service again. Like this:

<img src="https://blog.g-vo.org/wp-content/uploads/2017/02/density.png" alt="" width="600" height="380" class="alignnone size-full wp-image-81" />

Unless of course you changed the SQL in the meantime, in which case make would figure out it had to go back to the service.

In particular for the plots you’ll often have to override the defaults. Make is smart enough to figure this out. For instance, you could have two files like this:

$ cat pm_histogram.sql
  round(pmtot/10)*10 as bin, count(*) as n 
  SELECT sqrt(pmra*pmra+pmde*pmde)*3.6e6 as pmtot
  FROM hsoy.main) AS q
group by bin
$ cat pm_histogram_cleaned.vot
  round(pmtot/10)*10 as bin, 
  count(*) as n 
  FROM ( 
    SELECT sqrt(pmra*pmra+pmde*pmde)*3.6e6 as pmtot 
    FROM hsoy.main
    WHERE no_sc IS NULL) AS q 
  group by bin

(these were used to analyse the overall proper motions distributions in HSOY properties; note that each of these will run about 30 minutes or so, so better adapt them to what’s actually interesting to you before trying this).

No special handling in terms of queries is necessary for these, but the plot needs to be hand-crafted:

pm_histograms.png: pm_histogram.vot pm_histogram_cleaned.vot
	$(STILTS) plot2plane legend=false \
	  omode=out ofmt=png out=$@ \
  	title="All-sky" \
  	xpix=800 ypix=600 \
  	ylog=True xlog=True\
  	xlabel="PM bin [mas/yr, bin size=10]" \
  	xmax=4000 \
	layer1=mark \
  	color1=blue \
  	in1=pm_histogram.vot \
  	x1=bin \
  	y1=n \
	layer2=mark \
  	in2=pm_histogram_cleaned.vot \
  	x2=bin \

– that way, even if you go back to the stuff six months later, you can still figure out what you queried (the files are still there) and what you did then.

A makefile to play with (and safe from cut-and-paste problems) is available from Makefile_tapsample (rename to Makefile to reproduce the examples).

PPMXL+Gaia DR1=HSOY in the Heidelberg Data Center

The stunning precision of Gaia’s astrometry is already apparent in the first release of the data obtained by the satellite, available since last September. However, apart from the small TGAS subset (objects already observed by the 90ies HIPPARCOS astrometry satellite) there is no information on the objects’ proper motions in DR1.

Until Gaia-quality proper motions will become available in DR2, the HSOY catalog – described in Altmann et al’s paper Hot Stuff for One Year (HSOY) freshly up in arXiv and online at http://dc.g-vo.org/hsoy – can help if you can live with somewhat lesser-quality kinematics.

It derives proper motions for roughly half a billion stars from PPMXL and Gaia DR1, which already gives an unprecedented source for 4D astrometry around J2015. And you can start working with it right now. The catalog is available in GAVO’s Heidelberg data center (TAP access URL: http://dc.g-vo.org/tap; there’s also an SCS service). Fire up your favourite TAP or SCS client (our preference: TOPCAT) and search for HSOY.

Image: Errors in proper motion in declination in HSOY on the sky

HSOY average errors in proper motion in declination over the sky, in mas/yr. The higher errors south of -30 degrees are because the great sky surveys of the 50ies could not be extended to the southern sky, and thus the first epoch there typically is in the 1980ies.

Oh, and in case you’re new to the whole TAP/ADQL game: There’s our ADQL introduction, and if you’re at a German astronomical institution, we’d be happy to hold one of our VO Days at your institute – just drop us a mail.

DaCHS, SODA, and Datalink

DaCHS, the Data Center Helper Suite, is a comprehensive suite for publishing astronomical data to the Virtual Observatory, supporting most major protocols out there. On Dec 12, GAVO released a new version, 0.9.8. The most notable change is that now SODA is supported as specified in the last IVOA Proposed Recommendation.

This is fairly big news, as SODA is the VO’s answer to providing cutout services and the like, which obviously is important part with datasets in the Multi-Gigabyte range and the VO’s wider programme of trying to enable users to only download what they need. But even for spectra, which aren’t typically terribly large, we have been using SODA; for instance, when you just want to see the development of a single line over time, say,, it’s nice to not have to bother with the the full spectrum. The spectral client SPLAT has been offering such functionality for a couple of year now — watch out for the scissors icon in discovery results. These indicate SODA support on the respective services.

Another client that will support SODA and its basis Datalink is Aladin – we’ve seen a promising demo of that during the last Interop in Trieste. Until the clients are there, DaCHS contains a (largely re-usable) stylesheet that generates simple UIs for Datalink documents and SODA services. Some examples:

Note again that all of these are not actually web pages, they’re machine-readable metadata collections; if you don’t believe it, pull the URLs with curl. To learn more about the combo of Datalink and SODA, check out this ADASS 2015 poster (preferably before even looking at the not terribly readable standards texts).

If you’re running DaCHS yourself and can’t wait to run Datalink and SODA — here’s how to do that.

ProvenanceDM Working Draft released

We’ve released the first version of working draft for the IVOA Provenance Data Model at the IVOA documents page:

ProvenanceDM Working draft.

Updated versions will be put at the same URL (check the date! The first version is from 21st November 2016).

Want to get your hands on the very latest version?
Check out the volute svn repository! Since it’s not so easy to find what you want there, here’s the path to the Provenance Data Model at volute, and here’s a direct link to the latest development draft [pdf].

We’re happy to receive some feedback on the document via IVOA’s data modelling mailing list dm@ivoa.net.

UWS 1.1 approved!

UWS stands for Universal Worker Service and is an IVOA standard provides a protocol which can be used for accessing databases and other web services from the command line, e.g. using the python uws-client.
This allows to create (asynchronous) jobs for a web service (e.g. an SQL query), check their status, retrieve their results, abort or delete them.

The updated version 1.1 was approved at the InterOperability Meeting last week and brings some nice new features:

  • Job list filtering: When retrieving the job list, one can now retrieve only jobs created after a certain date, the latest n jobs or jobs with a certain phase (e.g. EXECUTING or COMPLETED)
  • WAIT: When asking for job details, it is now possible to append a WAIT parameter and provide an integer as wait-time in seconds. This means that the job details will only be returned when the wait-time is over or the job’s phase has changed, whichever comes first.

For all the details, have a look at the standard itself:
UWS 1.1 Recommendation.

A few examples using the CosmoSim database are given here:
UWS tutorial for CosmoSim (pdf), using 1.0 and
UWS 1.1 update at CosmoSim.

And if you want to implement UWS 1.1 for your own service, here is a test-tool that may be useful for validating for you for validating the new features: