Tangible Astronomy and Movies with TOPCAT

This March, I’ve put up two new VO resources (that’s jargon for “table or service or whatever”) that, I think, fit quite well what I like to call tangible astronomy: things you can readily relate to what you see when you step out at night. And, since I’m a professing astronomy nerd, that’s always nicely gratifying.

The two resources are the Constellations as Polygons and the Gaia eDR3 catalogue of nearby stars (GCNS).


On the constellations, you might rightfully say that’s really far from science. But then they do help getting an idea where something is, and when and from where you might see something. I’ve hence wanted for a long time to re-publish the Davenhall Constellation Boundary Data as proper, ADQL-queriable polygons, and figuring out where the loneliest star in the sky (and Voyager 1) were finally made me do it.

GCNS density around taurus.
Taurus in the GCNS density plot: with constellations!

So, since early March there’s the cstl.geo table on the TAP service at https://dc.g-vo.org/tap with the constallation polygons in its p column. Which, for starters, means it’s trivial to overplot constallation boundaries in your favourite VO clients now, as in the plot above. To make it, I’ve just done a boring SELECT * FROM cstl.geo, did the background (a plain HEALPix density plot of GCNS) and, clicked Layers → Add Area Control and selected the cstl.geo table.

If you want to identify constellations by clicking, while in the area control, choose “add central” from the Forms menu in the Form tab; that’s what I did in the figure above to ensure that what we’re looking at here is the Hyades and hence Taurus. Admittedly: these “centres“ are – as in the catalogue – just the means of the vertices rather than the centres of mass of the polygon (which are hard to compute). Oh, and: there is also the AreaLabel in the Forms menu, for when you need the identification more than the table highlighting (be sure to use a center anchor here).

Note that TOPCAT’s polygon plot at this point is not really geared towards large polygons (which the constellations are) right now. At the time of writing, the documentation has: “Areas specified in this way are generally intended for displaying relatively small shapes such as instrument footprints. Larger areas may also be specified, but there may be issues with use.” That you’ll see at the edges of the sky plots – but keeping that in mind I’d say this is a fun and potentially very useful feature.

What’s a bit worse: You cannot turn the constellation polygons into MOCs yet, because the MOC library currently running within our database will not touch non-convex polygons. We’re working on getting that fixed.

Nearby Stars

Similarly tangible in my book is the GCNS: nearby stars I always find romantic.

Let’s look at the 100 nearest stars, and let’s add spectral types from Henry Draper (cf. my post on Annie Cannon’s catalogue) as well as the constellation name:

WITH nearest AS (
  a.ra, a.dec,
FROM gcns.main AS a
LEFT OUTER JOIN hdgaia.main AS b 
  ON (b.source_id_dr3=a.source_id)
ORDER BY dist_50 ASC)
SELECT nearest.*, name
FROM nearest
JOIN cstl.geo AS g
    POINT(nearest.ra, nearest.dec), 

Note how I’m using CONTAINS with the polygon in the constellations table here; that’s the usage I’ve had in mind for this table (and it’s particularly handy with table uploads).

That I have a Common Table Expression (“WITH”) here is due to SQL planner confusion (I’ll post something about that real soon now): With the WITH, the machine first selects the nearest 100 rows and then does the (relatively costly) spatial match, without it, the machine (somewhat surprisingly) did the geometric match first. This particular confusion looks fixable, but for now I’d ask you for forgiveness for the hack – and the technique is often useful anyway.

If you inspect the result, you will notice that Proxima Cen is right there, but α Cen is missing; without having properly investigated matters, I’d say it’s just too bright for the current Gaia data reduction (and quite possibly even for future Gaia analysis).

Most of the objects on that list that have made it into the HD (i.e., have a spectral type here) are K dwarfs – which is an interesting conspiracy between the limits of the HD (the late red and old white dwarfs are too weak for it) and the limits of Gaia (the few earlier stars within 6 parsec – which includes such luminaries as Sirius at a bit more than 2.5 pc – are just too bright for where Gaia data reduction is now).


Another fairly tangible thing in the GCNS is the space velcity, given in km/s in the three dimensions U, V, and W. That is, of course, an invitation to look for stellar streams, as, within the relatively small portion of the Milky Way the GCNS looks at, stars on similar orbits will exhibit similar space motions.

Considering the velocity dispersion within a stellar stream will be a few km/s, let’s have the database bin the data. Even though this data is small enough to conveniently handle locally, this kind of remote analysis is half of what TAP is really great at (the other half being the ability to just jump right into a new dataset). You can group by multiple things at the same time:

  COUNT(*) AS n,
  ROUND(uvel_50/5)*5 AS ubin,
  ROUND(vvel_50/5)*5 AS vbin,
  ROUND(wvel_50/5)*5 AS wbin
FROM gcns.main
GROUP BY ubin, vbin, wbin

Note that this (truly) 3D histogram only represents a small minority of the GCNS objects – you need radial velocities for space motion, and these are precious even in the Gaia age.

What really surprised me is how clumpy this distribution is – are we sure we already know all stellar streams in the solar neighbourhood? Watch for yourself (if your browser can’t play webm, complain to your vendor):

[Update (2021-04-01): Mark Taylor points out that the “flashes” you sometimes see when the grid is aligned with the viewing axes (and the general appearance) could be improved by just pulling all non-NULL UVW values out of the table and using a density plot (perhaps shading=density densemap=inferno densefunc=linear). That is quite certainly true, but it would of course defeat the purpose of having on-server aggregation. Which, again, isn’t all that critical for this dataset, so doing the prettier plot actually is a valuable exercise for the reader]

How did I make this video? Well, I started with a Cube Plot in TOPCAT as usual, configuring weighted plotting with n as its weight and played around a bit with scaling out a few outliers. And then I saved the table (to zw.vot), hit “STILTS“ in the plot window and saved the text from there to a text file, zw.sh. I had to change the “in“ clause in the script to make it look like this:

  stilts plot2cube \
   xpix=887 ypix=431 \
   xlabel='ubin / km/s' ylabel='vbin / km/s' \
   zlabel='wbin / km/s' \
   xmin=-184.5 xmax=49.5 ymin=-77.6 ymax=57.6 \
   zmin=-119.1 zmax=94.1 phi=-84.27 theta=90.35 \
    psi=-62.21 \
   auxmin=1 auxmax=53.6 \
   auxvisible=true auxlabel=n \
   legend=true \
   layer=Mark \
      in=zw.vot \
      x=ubin y=vbin z=wbin weight=n \
      shading=weighted size=2 color=blue

– and presto, “sh zw.sh“ would produce the plot I just had in TOPCAT. This makes a difference because now I can animate this.

In his documentation, Mark already has a few hints on how to build animations; here are a few more ideas on how to organise this. For instance, if, as I want here, you want to animate more than one variable, stilts tloop may become a bit unwieldy. Here’s how to give the camera angles in python:

import sys
from astropy import table
import numpy

angles = numpy.array(
  [float(a) for a in range(0, 360)])
  names=("psi", "theta")).write(
    sys.stdout, format="votable")

– the only thing to watch out for is that the names match the names of the arguments in stilts that you want to animate (and yes, the creation of angles will make numpy afficionados shudder – but I wasn’t sure if I might want to have somewhat more complex logic there).

[Update (2021-04-01): Mark Taylor points out that all that Python could simply be replaced with a straightforward piece of stilts using the new loop table scheme in stilts, where you would simply put

  acmd='addcol phi $1'
  acmd='addcol theta 40+30*cosDeg($1+57)'

into the plot2cube command line – and you wouldn’t even need the shell pipeline.]

What’s left to do is basically the shell script that TOPCAT wrote for me above. In the script below I’m using a little convenience hack to let me quickly switch between screen output and file output: I’m defining a shell variable OUTPUT, and when I un-comment the second OUTPUT, stilts renders to the screen. The other changes versus what TOPCAT gave me are de-dented (and I’ve deleted the theta and psi parameters from the command line, as I’m now filling them from the little python script):

OUTPUT="omode=out out=pre-movie.png"

python3 camera.py |\
stilts plot2cube \
   xpix=500 ypix=500 \
   xlabel='ubin / km/s' ylabel='vbin / km/s' \
   zlabel='wbin / km/s' \
   xmin=-184.5 xmax=49.5 ymin=-77.6 ymax=57.6 \
   zmin=-119.1 zmax=94.1 \
   auxmin=1 auxmax=53.6 \
phi=8 \
animate=- \
afmt=votable \
   layer=Mark \
      in=zw.vot \
      x=ubin y=vbin z=wbin weight=n \
      shading=weighted size=4 color=blue

# render to movie with something like
# ffmpeg -i "pre-movie-%03d.png" -framerate 15 -pix_fmt yuv420p /stream-movie.webm
# (the yuv420p incantation is so real-world 
# web browsers properly will not go psychedelic 
# with the colours)

The comment at the end says how to make a proper movie out of the PNGs this produces, using ffmpeg (packaged with every self-respecting distribution these days) and yielding a webm. Yes, going for mpeg x264 might be a lot faster for you as it’s a lot more likely to have hardware support, but everything around mpeg is so patent-infested that for the sake of your first-born’s soul you probably should steer clear of it.

Movies are fun in webm, too.

The Loneliest Star in the Sky

sky images and a distribution plot
The loneliest star in the sky on the left, and on the right a somewhat more lonelier one (it’s explained in the text). The inset shows the distribution of the 500 loneliest stars on the whole sky in Galactic coordinates.

In early December, the object catalogue of Gaia’s data release 3 was published (“eDR3“), and I’ve been busy in various ways on this data off and on since then – see, for instance, the The Case of the disappearing bits on this blog.

One of the things I have missed when advising people on projects with previous Gaia data releases is a table that, for every object, gives the nearest neighbour. And so for this release I’ve created it and christened it, perhaps just a bit over-grandiosely, “Gaia eDR3 Autocorrelation”. Technically, it is just a long (1811709771 rows, to be precise) list of pairs of Gaia eDR3 source ids, the ids of their nearest neighbour, and a spherical distance between.

This kind of data is useful for many applications, mostly when looking for objects that are close together or (more often) things that fail for such close pairs for a wide variety of reasons. I have taken some pains to not only have close neighbours, though, because sometimes you may want specifically objects far away from others.

As in the case of this article’s featured image: The loneliest star in the sky (as seen by Gaia, that is) is eDR3 6049144983226879232, which is 4.3 arcminutes from its neighbour, 6049144021153793024, which in turn is the second-loneliest star in the sky. They are, perhaps a bit surprisingly, in Ophiuchus (and thus fairly close to the Milky Way plane), and (probably) only about 150 parsec from Earth. Doesn’t sound too lonely, hm? Turns out: these stars are lonely because dust clouds blot out all their neighbours.

Rank three is in another dust cloud, this time in Taurus, and so it continues in low Galactic latitude to rank 8 (4402975278134691456) at Galactic latitude 36.79 degrees; visualising the thing, it turns out it’s again in a dark cloud. What about rank 23 at 83.92 Galactic (3954600105683842048)? That’s probably bona-fide, or at least it doesn’t look very dusty in the either DSS or PanSTARRS. Coryn (see below) estimates it’s about 1100 parsec away. More than 1 kpc above the galactic disk: that’s more what I had expected for lonely stars.

Looking at the whole distribution of the 500 loneliest stars (inset above), things return a bit more to what I had expected: Most of them are around the galactic poles, where the stellar density is low.

So: How did I find these objects? Here’s the ADQL query I’ve used:

  ra, dec, source_id, phot_g_mean_mag, ruwe,
  partner_id, dist, 
  COORD2(gavo_transform('ICRS', 'GALACTIC', 
    point(ra, dec))) AS glat
  NATURAL JOIN gedr3auto.main

– run this on the TAP server at http://dc.g-vo.org/tap (don’t be shy, it’s a cheap query).

Most of this should be familiar to you if you’ve worked through the first pages of ADQL course. There’s two ADQL things I’d like to advertise while I have your attention:

  1. NATURAL JOIN is like a JOIN USING, except that the database auto-selects what column(s) to join on by matching the columns that have the same name. This is a convenient way to join tables designed to be joined (as they are here). And it probably won’t work at all if the tables haven’t been designed for that.
  2. The messy stuff with GALACTIC in it. Coordinate transformations had a bad start in ADQL; the original designers hoped they could hide much of this; and it’s rarely a good idea in science tools to hide complexity essentially everyone has to deal with. To get back on track in this field, DaCHS servers since about version 1.4 have been offering a user defined function gavo_transfrom that can transform (within reason) between a number of popular reference frames. You will find more on it in the server’s capabilities (in TOPCAT: the “service” tab). What is happening in the query is: I’m making a Point out of the RA and Dec given in the catalogue, tell the transform function it’s in ICRS and ask it to make Galactic coordinates from it, and then take the second element of the result: the latitude.

And what about the gedr3dist.litewithdist table? That doesn’t look a lot like the gaiaedr3.gaiasource we’re supposed to query for eDR3?

Well, as for DR2, I’m again only carrying a “lite” version of the Gaia catalogue in GAVO’s Heidelberg data center, stripped down to the columns you absolutely cannot live without even for the most gung-ho science; it’s called gaia.edr3lite.

But then my impression is that almost everyone wants distances and then hacks something to make Gaia’s parallax work for them. That’s a bad idea as the SNR goes down to levels very common in the Gaia result catalogue (see 2020arXiv201205220B if you don’t take my word for it). Hence, I’m offering a pre-joined view (a virtual table, if you will) with the carefully estimated distances from Coryn Bailer-Jones, and that’s this gedr3dist.litewithdist. Whenever you’re doing something with eDR3 and distances, this is where I’d point you first.

Oh, and I should be mentioning that, of course, I figured out what is in dust clouds and what is not with TOPCAT and Aladin as in our tutorial TOPCAT and Aladin working together (which needs a bit of an update, but you’ll figure it out).

There’s a lot more fun to be had with this (depending on what you find fun in). What about finding the 10 arcsec-pairs with the least different luminosities (which might actually be useful for testing some optics)? Try this:

  a.source_id, partner_id, dist, 
  a.phot_g_mean_mag AS source_mag,
  b.phot_g_mean_mag AS partner_mag,
  abs(a.phot_g_mean_mag-b.phot_g_mean_mag) AS magdiff
FROM gedr3auto.main
  NATURAL JOIN gaia.edr3lite AS a
  JOIN gaia.edr3lite AS b
    ON (partner_id=b.source_id)
  dist BETWEEN 9.999/3600 AND 10.001/3600
  AND a.phot_g_mean_mag IS NOT NULL
  AND b.phot_g_mean_mag IS NOT NULL
ORDER BY magdiff ASC

– this one takes a bit longer, as there’s many 10 arcsec-pairs in eDR3; the query above looks at 84690 of them. Of course, this only returns really faint pairs, and given the errors stars that weak have they’re probably not all that equal-luminosity as that. But fixing all that is left as an exercise to the reader. Given there’s the RP and BP magnitude columns, what about looking for the most colourful pair with a given separation?

Acknowledgement: I couldn’t have coolly mumbled about Ophiuchus or Taurus without the SCS service ivo://cds.vizier/vi/42 (”Identification of a Constellation From Position, Roman 1982”).

Update [2021-02-05]: I discovered an extra twist to this story: Voyager 1 is currently flying towards Ophiuchus (or so Wikipedia claims). With an industrial size package of artistic licence you could say: It’s coming to keep the loneliest star company. But of course: by the time Voyager will be 150 pc from earth, eDR3 6049144983226879232 will quite certainly have left Ophiuchus (and Voyager will be in a completely different part of our sky, that wouldn’t look familar to us at all) – so, I’m afraid apart from a nice conincidence in this very moment (galactically speaking), this whole thing won’t be Hollywood material.

PPMXL+Gaia DR1=HSOY in the Heidelberg Data Center

The stunning precision of Gaia’s astrometry is already apparent in the first release of the data obtained by the satellite, available since last September. However, apart from the small TGAS subset (objects already observed by the 90ies HIPPARCOS astrometry satellite) there is no information on the objects’ proper motions in DR1.

Until Gaia-quality proper motions will become available in DR2, the HSOY catalog – described in Altmann et al’s paper Hot Stuff for One Year (HSOY) freshly up in arXiv and online at http://dc.g-vo.org/hsoy – can help if you can live with somewhat lesser-quality kinematics.

It derives proper motions for roughly half a billion stars from PPMXL and Gaia DR1, which already gives an unprecedented source for 4D astrometry around J2015. And you can start working with it right now. The catalog is available in GAVO’s Heidelberg data center (TAP access URL: http://dc.g-vo.org/tap; there’s also an SCS service). Fire up your favourite TAP or SCS client (our preference: TOPCAT) and search for HSOY.

Image: Errors in proper motion in declination in HSOY on the sky

HSOY average errors in proper motion in declination over the sky, in mas/yr. The higher errors south of -30 degrees are because the great sky surveys of the 50ies could not be extended to the southern sky, and thus the first epoch there typically is in the 1980ies.

Oh, and in case you’re new to the whole TAP/ADQL game: There’s our ADQL introduction, and if you’re at a German astronomical institution, we’d be happy to hold one of our VO Days at your institute – just drop us a mail.