Register your stuff with purx!

TOPCAT screenshot
If you open the TAP dialog of TOPCAT, what you see is Registry content.

The VO Registry lets people find astronomical resources (which is jargon for “dataset, service, or stuff“). Currently, most of its users don’t even notice they’re using the Registry, as when TOPCAT just magically lists what TAP services are available (image above) – but there are also interfaces that let you directly interact with the registry, for instance GAVO’s WIRR service or ESAVO’s Registry Search.

Arguably, the usefulness of the Registry scales with its completeness. With sufficient completeness, the domain-specific, structured metadata will also make it interesting for generic discovery of astronomical data; in a quip, looking for UCDs in google will never work quite well – and without that, it’s hard to find things with queries like „radio fluxes of early-type stars”.

Either way: If you have a data set or a service dealing with astronomy, it’d be great if you could register it. To do this, so far you either had to set up a publishing registry, which is nontrivial even if you have a software that natively speaks a protocol called OAI-PMH (DaCHS does, but most other publishing suites don’t) or you could use one of two web interfaces to define your resource (notes for a talk on this I gave in 2016).

Neither of these options is really attractive if you publish only a few resources (so the overhead of running a publishing registry looks excessive) that change now and then (so using a web browser to update the resource records again and again is tedious). Therefore, GAVO has developed purx, the publishing registry proxy. We’ve officially announced it during the recent Southern Spring Interop in Santiago de Chile (Program), and the lecture notes for that talk are probably a good introduction to what this is about.

If you’re running VO services and have not registered them so far, you probably want to read both these notes and the service documentation. If, on the other hand, you just have a web-published directory of files or a browser-based service, you probably can skip even that. Just grab a sample record (use the one for a simple browser service in both cases) and adapt it to what’s fitting for your website. Then put the resulting file online somewhere and paste the URL of that location on purx’ enrollment service. In case you’re uncertain about some of the terms in the record, perhaps our crib sheet for metadata we ask our data providers for will be helpful.

There’s really no excuse any more for not being in the Registry!

See Who’s Kinking the Sky

A new arrival in the GAVO Data Center is UCAC5, another example of a slew of new catalogs combining pre-existing astrometry with Gaia DR1, just like the HSOY catalog we’ve featured here a couple of weeks back.

That’s a nice opportunity to show how to use ADQL’s JOIN operator for something else than the well-known CONTAINS-type crossmatch. Since both UCAC5 and HSOY reference Gaia DR1, both have, for each object, a notion which element of the Gaia source catalog they correspond to. For HSOY, that’s the gaia_id column, in UCAC5, it’s just source_id. Hence, to compare results from both efforts, all you have to do is to join on source_id=gaia_id (you can save yourself the explicit table references here because the column names are unique to each table.

So, if you want to compare proper motions, all you need to do is to point your favourite TAP client’s interface to http://dc.g-vo.org/tap and run

SELECT 
    in_unit(avg(uc.pmra-hsoy.pmra), 'mas/yr') AS pmradiff, 
    in_unit(avg(uc.pmde-hsoy.pmde), 'mas/yr') AS pmdediff, 
    count(*) as n, 
    ivo_healpix_index (6, raj2000, dej2000) AS hpx 
    FROM hsoy.main AS hsoy 
    JOIN ucac5.main as uc 
    ON (uc.source_id=hsoy.gaia_id) 
    WHERE comp IS NULL    -- hsoy junk filter
    AND clone IS NULL     -- again, hsoy junk filter
    GROUP BY hpx

(see Taylor et al’s All of the Sky if you’re unsure what do make of the healpix/GROUP BY magic).

Of course, the fact that both tables are in the same service helps, but with a bit of upload magic you could do about the same analysis across TAP services.

Just so there’s a colourful image in this post, too, here’s what this query shows for the differences in proper motion in RA:

(equatorial coordinates, and the aux axis is a bit cropped here; try for yourself to see how things look for PM in declination or when plotted in galactic coordinates).

What does this image mean? Well, it means that probably both UCAC5 and HSOY would still putt kinks into the sky if you wait long enough.

In the brightest and darkest points, if you waited 250 years, the coordinate system induced by each catalog on the sky would be off by 1 arcsec with respect to the other (on a sphere, that means there’s kinks somewhere). It may seem amazing that there’s agreement to at least this level between the two catalogs – mind you, 1 arcsec is still more than 100 times smaller than you could see by eye; you’d have to go back to the Mesolithic age to have the slightest chance of spotting the disagreement without serious optical aids. But when Gaia DR2 will come around (hopefully around April 2018), our sky will be more stable even than that.

Of course, both UCAC5 and HSOY are, indirectly, standing on the shoulders of the same giant, namely Hipparcos and Tycho, so the agreement may be less surprising, and we strongly suspect that a similar image will look a whole lot less pleasant when Gaia has straightened out the sky, in particular towards weaker stars.

But still: do you want to bet if UCAC5 or HSOY will turn out to be closer to a non-kinking sky? Let us know. Qualifications („For bright stars…”) are allowed.

Automating TAP queries

TOPCAT is a great tool – in particular, for prototyping and ad-hoc analyses, it’s hard to beat. But say you’ve come up with this great TAP query, and you want to preserve it, perhaps generate plots, then maybe fix them when you (or referees) have spotted problems.

Then it’s time to get acquainted with TOPCAT’s command line sister STILTS. This tool lets you do most of what TOPCAT can, but without user intervention. Since the command lines usually take a bit of twiddling I usually wrap stilts calls with GNU make, so I just need to run something like make fig3.pdf or make fig3.png and make figures out what to do, potentially starting with a query. Call it workflow preservation if you want.

How does it work? Well, of course with a makefile. In that, you’ll first want to define some variables that allow more concise rules later. Here’s how I usually start one:

STILTS?=stilts

# VOTables are the results of remote queries, so don't wantonly throw
# them away
.PRECIOUS: %.vot

# in this particular application, it helps to have this global
HEALPIX_ORDER=6

# A macro that contains common stuff for stilts TAP query -- essentially,
# just add adql=
TAPQUERY=$(STILTS) tapquery \
  tapurl='http://dc.g-vo.org/tap' \
  maxrec=200000000 \
  omode=out \
  out=$@ \
  ofmt=vot \
  executionduration=14400 

# A sample plot macro.  Here, we do a healpix plot of some order. Also
# add value_1=<column to plot>
HEALPIXPLOT=$(STILTS) plot2sky \
  auxmap=inferno \
  auxlabel='$*'\
  auxvisible=true \
  legend=false \
  omode=out \
  out=$@ \
  projection=aitoff \
  sex=false \
  layer_1=healpix \
    datalevel_1=$(HEALPIX_ORDER) \
    datasys_1=equatorial \
    viewsys_1=equatorial \
    degrade_1=0 \
    combiner_1=sum \
    transparency_1=0 \
    healpix_1=hpx \
    in_1=$< \
    ifmt_1=votable \
    istream_1=true \

For the somewhat scary STILS command lines, see the STILTS documentation or just use your intution (which mostly should give you the right idea what something is for).

The next step is to define some pattern rules; these are a (in the syntax here, GNU) make feature that let you say „to make a file matching the destination pattern when you have one with the source pattern, use the following commands”. You can use a number of variables in the rules, in particular $@ (the current target) and $< (the first prerequisite). These are already used in the definitions of TAPQUERY and HEALPIXPLOT above, so they don’t necessarily turn up here:

# healpix plots from VOTables; these will plot obs
%.png: %.vot
	$(HEALPIXPLOT) \
  	value_1=obs \
  	ofmt=png \
  	xpix=600 ypix=380 

%.pdf: %.vot
	$(HEALPIXPLOT) \
  	value_1=obs \
  	ofmt=pdf

# turn SQL statements into VOTables using TAP
%.vot: %.sql
	$(TAPQUERY) \
  	adql="`cat $<`"

Careful with cut and paste: The leading whitespace here must be a Tab in rules, not just some blanks (this is probably the single most annoying feature of make. You’ll get used to it.)

What can you do with it? Well, for instance you can write an ADQL query into a file density.sql; say:

SELECT
  count(*) as obs,
  -- "6" in the next line must match HEALPIX_ORDER in the Makefile
  ivo_healpix_index (6, alphaFloat, deltaFloat) AS hpx
FROM ppmx.data
GROUP BY hpx

And with this, you can say

make density.pdf

and get a nice PDF with the plot resulting from that query. Had you just said make density.vot, make would just have executed the query and left the VOTable, e.g., for investigation with TOPCAT, and if you were to type make density.png, you’d get a nice PNG without querying the service again. Like this:

<img src="https://blog.g-vo.org/wp-content/uploads/2017/02/density.png" alt="" width="600" height="380" class="alignnone size-full wp-image-81" />

Unless of course you changed the SQL in the meantime, in which case make would figure out it had to go back to the service.

In particular for the plots you’ll often have to override the defaults. Make is smart enough to figure this out. For instance, you could have two files like this:

$ cat pm_histogram.sql
SELECT
  round(pmtot/10)*10 as bin, count(*) as n 
FROM ( 
  SELECT sqrt(pmra*pmra+pmde*pmde)*3.6e6 as pmtot
  FROM hsoy.main) AS q
group by bin
$ cat pm_histogram_cleaned.vot
SELECT 
  round(pmtot/10)*10 as bin, 
  count(*) as n 
  FROM ( 
    SELECT sqrt(pmra*pmra+pmde*pmde)*3.6e6 as pmtot 
    FROM hsoy.main
    WHERE no_sc IS NULL) AS q 
  group by bin

(these were used to analyse the overall proper motions distributions in HSOY properties; note that each of these will run about 30 minutes or so, so better adapt them to what’s actually interesting to you before trying this).

No special handling in terms of queries is necessary for these, but the plot needs to be hand-crafted:

pm_histograms.png: pm_histogram.vot pm_histogram_cleaned.vot
	$(STILTS) plot2plane legend=false \
	  omode=out ofmt=png out=$@ \
  	title="All-sky" \
  	xpix=800 ypix=600 \
  	ylog=True xlog=True\
  	xlabel="PM bin [mas/yr, bin size=10]" \
  	xmax=4000 \
	layer1=mark \
  	color1=blue \
  	in1=pm_histogram.vot \
  	x1=bin \
  	y1=n \
	layer2=mark \
  	in2=pm_histogram_cleaned.vot \
  	x2=bin \
  	y2=n

– that way, even if you go back to the stuff six months later, you can still figure out what you queried (the files are still there) and what you did then.

A makefile to play with (and safe from cut-and-paste problems) is available from Makefile_tapsample (rename to Makefile to reproduce the examples).