Posts with the Tag TOPCAT:

  • HEALPix Maps: In General and in Gaia

    blue and reddish pixels drawing a bar on the sky.

    A map of average Gaia colours in HEALPixes 2/83 and 2/86 (Orion south-east). This post tells you how to (relatively) quickly produce such maps.

    This year's puzzler for the AG Tagung turned out to be a valuable source of interesting ADQL queries. I have already written about finding dusty spots on the sky, and in the puzzler solution, I had promised some words on creating dust maps, or, more generally, HEALPix maps of any sort.

    Making HEALPix maps with Gaia source_ids

    The basic technique is explained in Mark Taylor's classical ADASS poster from 2016. On GAVO's TAP service (access URL http://dc.g-vo.org/tap), you will also find an example for that (in TOPCAT's TAP window, check the Service-provided section unter the Examples button for it). However, once you have Gaia source_ids, there is something a lot faster and arguably not much less convenient. Let me quote the footnote on source_id from my DR3 lite table:

    For the contents of Gaia DR3, the source ID consists of a 64-bit integer, least significant bit = 1 and most significant bit = 64, comprising:

    • a HEALPix index number (sky pixel) in bits 36 - 63; by definition the smallest HEALPix index number is zero.
    • […]

    This means that the HEALpix index level 12 of a given source is contained in the most significant bits. HEALpix index of 12 and lower levels can thus be retrieved as follows:

    • [...]
    • HEALpix [at] level n = (source_id)/(235⋅412 − n).

    That is: Once you have a Gaia source_id, you an compute HEALpix indexes on levels 12 or less by a simple integer division! I give you that the more-than-35-bit numbers you have to divide by do look a bit scary – but you can always come back here for cutting and pasting:

    HEALPix level Integer-divide source id by
    12 34359738368
    11 137438953472
    10 549755813888
    9 2199023255552
    8 8796093022208
    7 35184372088832
    6 140737488355328
    5 562949953421312
    4 2251799813685248
    3 9007199254740992
    2 36028797018963968

    If you know – and that is very valuable knowledge far beyond this particular application – that you can simply jump between HEALPix indexes of different levels by multiplying with or integer-dividing by four, the general formula in the footnote actually becomes rather memorisable. Let me illustrate that with an example in Python. HEALPix number 3145 on level 6 is:

    >>> 3145//4  # ...within this HEALPix on level 5...
    786
    >>> 3145*4, (3145+1)*4  # ..and covers these on level 7...
    (12580, 12584)
    

    Simple but ingenious.

    You can immediately exploit this to make HEALPix maps like the one in the puzzler. This piece of ADQL does the job within a few seconds on the GAVO DC TAP service[1]:

    SELECT source_id/8796093022208 AS pix,
      AVG(phot_bp_mean_mag-phot_rp_mean_mag) AS avgcol
    FROM gaia.edr3lite
    WHERE distance(ra, dec, 246.7, -24.5)<2
    GROUP by pix
    

    Using the table above, you see that the horrendous 8796093022208 is the code for HEALPix level 8. When you remember (and you should) that HEALPix level 6 corresponds to a linear dimension of about 1 degree and each level is a factor of two in linear dimension, you see that the map ought to have a resolution of about 1/8th of a degree.

    HEALPix to Screen Pixel

    How do you plot this? Well, in TOPCAT, do GraphicsSky Plot, and then in the plot window LayersAdd HEALPix control (there are icons for both of these, too). You then have to manually configure the plot for the table you just retrieved: Set the Level to 8, the index to pix and the Value to avgcol – we're working on making the annotation a bit richer so that TOPCAT has a chance to figure this out by itself.

    With a bit of extra configuration, you get the following map of average colours (really: dust concentration):

    Plot: Black and reddish pixels showing a bit of structure

    This is not totally ideal, as at the border of the cone, certain Healpixes are only partially covered, which makes statistics unnecessarily harder.

    Positional Constraints using source_ids

    Due to Gaia's brilliant numbering scheme, we can do analysis by HEALpix, too, circumventing (among other things) this problem. Say you are interested in the vicinity of the M42 and would like to investigate a patch of about 8 degrees. By our rule of thumb, 8 degrees is three levels up from the one-degree level 6. To find the corresponding HEALpix index, on DaCHS servers with their gavo_simbadpoint UDF you could say:

    SELECT TOP 1 ivo_healpix_index(3, gavo_simbadpoint('M42'))
    FROM tap_schema.tables
    

    Hu, you ask, what's tap_schema.tables to do with this? Well, nothing, really. It's just that ADQL's syntax requires selecting from a table, even if what we select is completely independent of any table, as for instance the index of M42's 3-HEALpix. The hack above picks in a table guaranteed to exist on all TAP services, and the TOP 1 makes sure we only compute the value once. In case you ever feel the need to abuse a TAP service as a calculator: Keep this trick in mind.

    The result, 334, you could also have found more graphically, as follows:

    1. Start Aladin
    2. Check OverlayHEALPix grid
    3. Enter M42 in Command
    4. Zoom out until you see HEALPix indexes of level 3 in the grid.

    An advantage you have with this method: You see that M42 happens to lie on a border of HEALPixes; perhaps you should include all of 334, 335, 356, and 357 if you were really interested in the Orion Nebula's vicinity.

    We, on the other hand, are just interested in instructive examples, and hence let's just repeat our colour mapping with all Gaia objects from HEALPix 3/334. How do you select these? Well, by source_id's construction, you know their source_ids will be between 334⋅9007199254740992 and (334 + 1)⋅9007199254740992 − 1:

    SELECT source_id/8796093022208 AS pix,
      AVG(phot_bp_mean_mag-phot_rp_mean_mag) AS avgcol
    FROM gaia.edr3lite
    WHERE source_id BETWEEN 334*9007199254740992 AND 335*9007199254740992-1
    GROUP by pix
    

    This is computationally cheap (though Postgres, not being a column store still has to do quite a bit of I/O; note how much faster this query is when you run it again and all the tuples are already in memory). Even going to HEALPix level 2 would in general still be within our sync time limit. The opening figure was produced with the constraint:

    source_id BETWEEN 83*36028797018963968 AND 84*36028797018963968-1
    OR source_id BETWEEN 86*36028797018963968 AND 87*36028797018963968-1
    

    – and with a sync query.

    Aggregating over a Non-HEALPix

    One last point: The constraints we have just been using are, in effect, positional constraints. You can also use them as quick and in some sense rather unbiased sampling tools.

    For instance, if you would like so see how the reddening in one of the “dense“ spots in the opening picture behaves with distance, you could first pick a point – α = 98, δ = 4, say –, then convert that to a level 7 healpix as above (that's/88974) and then write:

    SELECT ROUND(r_med_photogeo/200)*200 AS distbin, COUNT(*) as n,
        AVG(phot_bp_mean_mag-phot_rp_mean_mag) AS avgcol
    FROM gaia.dr3lite
    JOIN gedr3dist.main USING (source_id)
    WHERE source_id BETWEEN 88974*35184372088832 and 88975*35184372088832-1
    GROUP BY distbin
    

    This is creating 200 pc bins in distance based on the estimates in the gedr3dist.main table (note that this adds subtle correlations, because these estimates already contain Gaia colour information). Since quite a few of these bins will be very sparsely populated, I'm also fetching the number of objects contributing. And then I plot the whole thing, using the conventional (n) ⁄ n as a rough estimate for the relative error:

    Plot: A line that first slowly declines, then rises quite a bit, then flattens out and becomes crazy as errors start to dominate.

    This plot immediatly shows that colour systematics are not exclusively due to dust, as in that case things would only get redder all the time. The blueward trend up to 700 pc is reasonably well explained by the brighter, bluer upper main sequence becoming more dominant in the population sampled as red dwarfs become too faint for Gaia.

    The strong reddening setting in after that is rather certainly due to the Orion complex, though I would perhaps not have expected it to reach out to 2 kpc (the conventional distance to M42 is about 0.5 kpc); without having properly thought about it, I'll chalk it off as “the Orion arm“. And after that, it's again what I'd call Malmquist-blueing until the whole things dissolves into noise.

    In conclusion: Did you know you can group by both healpix and distbin at the same time? I am sure there are interesting structures to be found in what you will get from such a query…

    [1]You may be tempted to write source_id/(POWER(2, 35)*POWER(4, 3) here for clarity. Resist that temptation. POWER returns floating point numbers. If you have one float in a division, not even a ROUND will get you back into the integer division realm, and the whole trick implodes. No, you will need the integer literals for now.
  • Computing Residuals of an Astrometric Calibration

    Two plots, left a fairly good correlation, right a cloudy wave

    The kind of plot you can make following the recipe given here: Left, a comparison of the photometry, right, a positional residuals, not taking into account the SIP plate solution, when comparing the HDAP plate B3261a against Gaia DR3. Note that the cut-off a 4 arcsec is because of the match radius when obtaining the calibrator stars.

    I recently had to assess the quality of the astrometric calibration of a photographic plate. What I am going to show you in this post will of course work just as well for CCD frames, and if these have a sufficiently large field of view, this may be an issue for them as well. However, the sort of data that needs this assessment most typically are scans of plates, as these tend to have a “wobble”, systematic offsets in the scan direction resulting from imperfections in the mechanics.

    Prerequisites: An astronomical frame with a calibration in ICRS (or some frame not very far from it), called my-image.fits in the following, SExtractor (in Debian and derivatives: apt install source-extractor – long live Debian Astro; since it's called source-extractor in Debian, that's what I'll use here, too), and of course TOPCAT.

    Step 1: Extract Sources. Source extraction is of course a high science, and if you know better than me, by all means do it the way you think is appropriate. Meanwhile, the following might very well work for you sufficiently well.

    Create a working directory and enter it. Then, to create a file telling source-extractor what columns you would like to see, write the following to a file default.param:

    ALPHA_SKY
    DELTA_SKY
    X_IMAGE
    Y_IMAGE
    MAG_ISO
    FLUX_AUTO
    ELONGATION
    

    Next, give a few parameters to source-extractor; depending on the sort of image you have, you may want to play around with DETECT_MINAREA (how many pixels need to show a signal to register as a source) and DETECT_THRESH (how many sigmas a pixel has to be above the background to register as a candidate for belonging to a source). Meanwhile, write the following into a file default.control:

    CATALOG_TYPE     FITS_1.0
    CATALOG_NAME     img.axy
    PARAMETERS_NAME  default.param
    FILTER           N
    DETECT_MINAREA   30
    DETECT_THRESH    4
    SEEING_FWHM      1.2
    

    – but if the following call gives you a few hundred sources, that ought to work for the present purpose.

    Then run:

    source-extractor -c default.control my-image.fits
    

    This will give you a catalogue of extracted objects in the file img.axy.

    Step 2: Fix source-extractor's output. Load that img.axy into TOPCAT. Regrettably, source-extractor does not add any useful metadata to the columns of its output table. To add the absolute bare minimum, in TOPCAT go to ViewsColumn Info. In that window, check UCD in the Display menu, and then put pos.eq.ra and pos.eq.dec into the UCD fields of the ALPHA_SKY and DELTA_SKY columns, respectively; double click to change fields in TOPCAT.

    To see if you have done the annotation right, in TOPCAT's main window, click GraphicsSky Plot. If the objects show up, you have just provided enough annotation to let TOPCAT figure out the position for each row.

    Step 3: Get calibrators. We will now try to add counterparts for Gaia DR3 to the extracted sources. To do that, click VOTable Access Protocol, and in the window popping up double click the entry for the GAVO DC TAP.

    In the Find box, type dr3lite to look for this site's version of the Gaia DR3 source catalogue. Click on gaia.dr3lite to select that table, and then select the Columns pane. This should show some of the Gaia DR3 columns.

    Now ExamplesUpload Join will generate a query that will cross-match your extracted sources with the Gaia sources. You should edit it a bit, only selecting the columns you will actually need, removing the TOP 1000 (at least on large images with more than 1000 sources), and reducing the match radius a bit when the calibration is not actually completely off and your epoch is sufficiently close to J2000.

    Hint: you can control-click in the Columns pane and then use the Cols button to insert all the column names in one go[1]. For me, the resulting query would be:

    SELECT
       source_id, ra, dec, phot_bp_mean_mag,
       tc.*
       FROM gaia.dr3lite AS db
       JOIN TAP_UPLOAD.t1 AS tc
       ON 1=CONTAINS(POINT('ICRS', db.ra, db.dec),
                     CIRCLE('ICRS', tc.ALPHA_SKY, tc.DELTA_SKY, 4./3600.))
    

    This should result in about as many matches as your extraction had – a few more is ok, because you will have some spurious matches, a few less is ok, too, as there are always some outliers and artefacts, but you should clearly not pull a magnitude more or less objects here than you put in; fiddle with the match radius as necessary.

    See if there is a rough correlation between the Gaia calibrators and your extracted sources by plotting phot_bp_mean_mag against MAG_ISO. Absent more information, MAG_ISO, source-extractor's guess for the magnitude of the extracted object, will be just some crazy number, but it should have some discernable correlation with the actual magnitude. Do not expect too much here, in particular with old plates, for which good photometry is a science of their own.

    In my example, this looked like this:

    Plot: a rough correlation in red with a green tail

    The green points certainly are spurious matches; this observation did not reach beyond 14th magnitude or so, and there are many weak stars on the sky, so a few of them will show up in just about any cross match. See the opening picture for an example with a better correlation.

    Step 4: Do the correlation plot. Do GraphicsPlane Plot and then plot ra-alpha_sky or dec-delta_sky against X_IMAGE or Y_IMAGE. You could get something like this:

    Plot: A single wavy thing

    This rather certainly reflects some optical distortion; source-extractor regrettably does not take into account SIP corrections yet, so it is likely that a large part of this would be taken care of by the polynomials of the plate solution (the github issue I am linking to tells you how to be sure).

    But it can also look like this:

    Plot: Multiple wobbles

    This certainly is not the result of a lens or anything optical at all. It's the scanner's gears that you are looking at here. With an amplitude of perhaps three arcseconds this is rather excessive here; but something like this you will rather likely see even on good scanners – though it may essentially be invisible, as of the Heidelberg scanner we used for HDAP:

    Plot: A vertical cloud with no discernible structure.
    [1]I'm using the BP magnitude in the query below as most historical plates tend to be “blue sensitive“ (in some sense). Hence, BP magnitudes should be a bit closer to what source-extractor has extracted.
  • Tangible Astronomy and Movies with TOPCAT

    This March, I've put up two new VO resources (that's jargon for “table or service or whatever”) that, I think, fit quite well what I like to call tangible astronomy: things you can readily relate to what you see when you step out at night. And, since I'm a professing astronomy nerd, that's always nicely gratifying.

    The two resources are the Constellations as Polygons and the Gaia eDR3 catalogue of nearby stars (GCNS).

    Constellations

    On the constellations, you might rightfully say that's really far from science. But then they do help getting an idea where something is, and when and from where you might see something. I've hence wanted for a long time to re-publish the Davenhall Constellation Boundary Data as proper, ADQL-queriable polygons, and figuring out where the loneliest star in the sky (and Voyager 1) were finally made me do it.

    GCNS density around taurus

    Taurus in the GCNS density plot: with constellations!

    So, since early March there's the cstl.geo table on the TAP service at https://dc.g-vo.org/tap with the constallation polygons in its p column. Which, for starters, means it's trivial to overplot constallation boundaries in your favourite VO clients now, as in the plot above. To make it, I've just done a boring SELECT * FROM cstl.geo, did the background (a plain HEALPix density plot of GCNS) and, clicked Layers → Add Area Control and selected the cstl.geo table.

    If you want to identify constellations by clicking, while in the area control, choose “add central” from the Forms menu in the Form tab; that's what I did in the figure above to ensure that what we're looking at here is the Hyades and hence Taurus. Admittedly: these “centres“ are – as in the catalogue – just the means of the vertices rather than the centres of mass of the polygon (which are hard to compute). Oh, and: there is also the AreaLabel in the Forms menu, for when you need the identification more than the table highlighting (be sure to use a center anchor here).

    Note that TOPCAT's polygon plot at this point is not really geared towards large polygons (which the constellations are) right now. At the time of writing, the documentation has: “Areas specified in this way are generally intended for displaying relatively small shapes such as instrument footprints. Larger areas may also be specified, but there may be issues with use.” That you'll see at the edges of the sky plots – but keeping that in mind I'd say this is a fun and potentially very useful feature.

    What's a bit worse: You cannot turn the constellation polygons into MOCs yet, because the MOC library currently running within our database will not touch non-convex polygons. We're working on getting that fixed.

    Nearby Stars

    Similarly tangible in my book is the GCNS: nearby stars I always find romantic.

    Let's look at the 100 nearest stars, and let's add spectral types from Henry Draper (cf. my post on Annie Cannon's catalogue) as well as the constellation name:

    WITH nearest AS (
    SELECT TOP 100
      a.source_id,
      a.ra, a.dec,
      phot_g_mean_mag,
      dist_50,
      spectral
    FROM gcns.main AS a
    LEFT OUTER JOIN hdgaia.main AS b
      ON (b.source_id_dr3=a.source_id)
    ORDER BY dist_50 ASC)
    SELECT nearest.*, name
    FROM nearest
    JOIN cstl.geo AS g
      ON (1=CONTAINS(
        POINT(nearest.ra, nearest.dec),
        p))
    

    Note how I'm using CONTAINS with the polygon in the constellations table here; that's the usage I've had in mind for this table (and it's particularly handy with table uploads).

    That I have a Common Table Expression (“WITH”) here is due to SQL planner confusion (I'll post something about that real soon now): With the WITH, the machine first selects the nearest 100 rows and then does the (relatively costly) spatial match, without it, the machine (somewhat surprisingly) did the geometric match first. This particular confusion looks fixable, but for now I'd ask you for forgiveness for the hack – and the technique is often useful anyway.

    If you inspect the result, you will notice that Proxima Cen is right there, but α Cen is missing; without having properly investigated matters, I'd say it's just too bright for the current Gaia data reduction (and quite possibly even for future Gaia analysis).

    Most of the objects on that list that have made it into the HD (i.e., have a spectral type here) are K dwarfs – which is an interesting conspiracy between the limits of the HD (the late red and old white dwarfs are too weak for it) and the limits of Gaia (the few earlier stars within 6 parsec – which includes such luminaries as Sirius at a bit more than 2.5 pc – are just too bright for where Gaia data reduction is now).

    Animation

    Another fairly tangible thing in the GCNS is the space velcity, given in km/s in the three dimensions U, V, and W. That is, of course, an invitation to look for stellar streams, as, within the relatively small portion of the Milky Way the GCNS looks at, stars on similar orbits will exhibit similar space motions.

    Considering the velocity dispersion within a stellar stream will be a few km/s, let's have the database bin the data. Even though this data is small enough to conveniently handle locally, this kind of remote analysis is half of what TAP is really great at (the other half being the ability to just jump right into a new dataset). You can group by multiple things at the same time:

    SELECT
      COUNT(*) AS n,
      ROUND(uvel_50/5)*5 AS ubin,
      ROUND(vvel_50/5)*5 AS vbin,
      ROUND(wvel_50/5)*5 AS wbin
    FROM gcns.main
    GROUP BY ubin, vbin, wbin
    

    Note that this (truly) 3D histogram only represents a small minority of the GCNS objects – you need radial velocities for space motion, and these are precious even in the Gaia age.

    What really surprised me is how clumpy this distribution is – are we sure we already know all stellar streams in the solar neighbourhood? Watch for yourself (if your browser can't play webm, complain to your vendor):

    [Update (2021-04-01): Mark Taylor points out that the “flashes” you sometimes see when the grid is aligned with the viewing axes (and the general appearance) could be improved by just pulling all non-NULL UVW values out of the table and using a density plot (perhaps shading=density densemap=inferno densefunc=linear). That is quite certainly true, but it would of course defeat the purpose of having on-server aggregation. Which, again, isn't all that critical for this dataset, so doing the prettier plot actually is a valuable exercise for the reader]

    How did I make this video? Well, I started with a Cube Plot in TOPCAT as usual, configuring weighted plotting with n as its weight and played around a bit with scaling out a few outliers. And then I saved the table (to zw.vot), hit “STILTS“ in the plot window and saved the text from there to a text file, zw.sh. I had to change the ``in`` clause in the script to make it look like this:

    #!/bin/sh
    stilts plot2cube \
     xpix=887 ypix=431 \
     xlabel='ubin / km/s' ylabel='vbin / km/s' \
     zlabel='wbin / km/s' \
     xmin=-184.5 xmax=49.5 ymin=-77.6 ymax=57.6 \
     zmin=-119.1 zmax=94.1 phi=-84.27 theta=90.35 \
      psi=-62.21 \
     auxmin=1 auxmax=53.6 \
     auxvisible=true auxlabel=n \
     legend=true \
     layer=Mark \
        in=zw.vot \
        x=ubin y=vbin z=wbin weight=n \
        shading=weighted size=2 color=blue
    

    – and presto, sh zw.sh would produce the plot I just had in TOPCAT. This makes a difference because now I can animate this.

    In his documentation, Mark already has a few hints on how to build animations; here are a few more ideas on how to organise this. For instance, if, as I want here, you want to animate more than one variable, stilts tloop may become a bit unwieldy. Here's how to give the camera angles in python:

    import sys
    from astropy import table
    import numpy
    
    angles = numpy.array(
      [float(a) for a in range(0, 360)])
    table.Table([
        angles,
        40+30*numpy.cos((angles+57)*numpy.pi/180)],
      names=("psi", "theta")).write(
        sys.stdout, format="votable")
    

    – the only thing to watch out for is that the names match the names of the arguments in stilts that you want to animate (and yes, the creation of angles will make numpy afficionados shudder – but I wasn't sure if I might want to have somewhat more complex logic there).

    [Update (2021-04-01): Mark Taylor points out that all that Python could simply be replaced with a straightforward piece of stilts using the new loop table scheme in stilts, where you would simply put:

    animate=:loop:0,360,0.5
    acmd='addcol phi $1'
    acmd='addcol theta 40+30*cosDeg($1+57)'
    

    into the plot2cube command line – and you wouldn't even need the shell pipeline.]

    What's left to do is basically the shell script that TOPCAT wrote for me above. In the script below I'm using a little convenience hack to let me quickly switch between screen output and file output: I'm defining a shell variable OUTPUT, and when I un-comment the second OUTPUT, stilts renders to the screen. The other changes versus what TOPCAT gave me are de-dented (and I've deleted the theta and psi parameters from the command line, as I'm now filling them from the little python script):

    OUTPUT="omode=out out=pre-movie.png"
    #OUTPUT=omode=swing
    
    python3 camera.py |\
    stilts plot2cube \
       xpix=500 ypix=500 \
       xlabel='ubin / km/s' ylabel='vbin / km/s' \
       zlabel='wbin / km/s' \
       xmin=-184.5 xmax=49.5 ymin=-77.6 ymax=57.6 \
       zmin=-119.1 zmax=94.1 \
       auxmin=1 auxmax=53.6 \
    phi=8 \
    animate=- \
    afmt=votable \
    $OUTPUT \
       layer=Mark \
          in=zw.vot \
          x=ubin y=vbin z=wbin weight=n \
          shading=weighted size=4 color=blue
    
    # render to movie with something like
    # ffmpeg -i "pre-movie-%03d.png" -framerate 15 -pix_fmt yuv420p /stream-movie.webm
    # (the yuv420p incantation is so real-world
    # web browsers properly will not go psychedelic
    # with the colours)
    

    The comment at the end says how to make a proper movie out of the PNGs this produces, using ffmpeg (packaged with every self-respecting distribution these days) and yielding a webm. Yes, going for mpeg x264 might be a lot faster for you as it's a lot more likely to have hardware support, but everything around mpeg is so patent-infested that for the sake of your first-born's soul you probably should steer clear of it.

    Movies are fun in webm, too.

  • The Loneliest Star in the Sky

    sky images and a distribution plot

    The loneliest star in the sky on the left, and on the right a somewhat more lonelier one (it's explained in the text). The inset shows the distribution of the 500 loneliest stars on the whole sky in Galactic coordinates.

    In early December, the object catalogue of Gaia's data release 3 was published (“eDR3“), and I've been busy in various ways on this data off and on since then – see, for instance, the The Case of the disappearing bits on this blog.

    One of the things I have missed when advising people on projects with previous Gaia data releases is a table that, for every object, gives the nearest neighbour. And so for this release I've created it and christened it, perhaps just a bit over-grandiosely, “Gaia eDR3 Autocorrelation”. Technically, it is just a long (1811709771 rows, to be precise) list of pairs of Gaia eDR3 source ids, the ids of their nearest neighbour, and a spherical distance between.

    This kind of data is useful for many applications, mostly when looking for objects that are close together or (more often) things that fail for such close pairs for a wide variety of reasons. I have taken some pains to not only have close neighbours, though, because sometimes you may want specifically objects far away from others.

    As in the case of this article's featured image: The loneliest star in the sky (as seen by Gaia, that is) is eDR3 6049144983226879232, which is 4.3 arcminutes from its neighbour, 6049144021153793024, which in turn is the second-loneliest star in the sky. They are, perhaps a bit surprisingly, in Ophiuchus (and thus fairly close to the Milky Way plane), and (probably) only about 150 parsec from Earth. Doesn't sound too lonely, hm? Turns out: these stars are lonely because dust clouds blot out all their neighbours.

    Rank three is in another dust cloud, this time in Taurus, and so it continues in low Galactic latitude to rank 8 (4402975278134691456) at Galactic latitude 36.79 degrees; visualising the thing, it turns out it's again in a dark cloud. What about rank 23 at 83.92 Galactic (3954600105683842048)? That's probably bona-fide, or at least it doesn't look very dusty in the either DSS or PanSTARRS. Coryn (see below) estimates it's about 1100 parsec away. More than 1 kpc above the galactic disk: that's more what I had expected for lonely stars.

    Looking at the whole distribution of the 500 loneliest stars (inset above), things return a bit more to what I had expected: Most of them are around the galactic poles, where the stellar density is low.

    So: How did I find these objects? Here's the ADQL query I've used:

    SELECT TOP 500
      ra, dec, source_id, phot_g_mean_mag, ruwe,
      r_med_photogeo,
      partner_id, dist,
      COORD2(gavo_transform('ICRS', 'GALACTIC',
        point(ra, dec))) AS glat
    FROM
      gedr3dist.litewithdist
      NATURAL JOIN gedr3auto.main
    ORDER BY dist DESC
    

    – run this on the TAP server at http://dc.g-vo.org/tap (don't be shy, it's a cheap query).

    Most of this should be familiar to you if you've worked through the first pages of ADQL course. There's two ADQL things I'd like to advertise while I have your attention:

    1. NATURAL JOIN is like a JOIN USING, except that the database auto-selects what column(s) to join on by matching the columns that have the same name. This is a convenient way to join tables designed to be joined (as they are here). And it probably won't work at all if the tables haven't been designed for that.
    2. The messy stuff with GALACTIC in it. Coordinate transformations had a bad start in ADQL; the original designers hoped they could hide much of this; and it's rarely a good idea in science tools to hide complexity essentially everyone has to deal with. To get back on track in this field, DaCHS servers since about version 1.4 have been offering a user defined function gavo_transfrom that can transform (within reason) between a number of popular reference frames. You will find more on it in the server's capabilities (in TOPCAT: the “service” tab). What is happening in the query is: I'm making a Point out of the RA and Dec given in the catalogue, tell the transform function it's in ICRS and ask it to make Galactic coordinates from it, and then take the second element of the result: the latitude.

    And what about the gedr3dist.litewithdist table? That doesn't look a lot like the gaiaedr3.gaiasource we're supposed to query for eDR3?

    Well, as for DR2, I'm again only carrying a “lite” version of the Gaia catalogue in GAVO's Heidelberg data center, stripped down to the columns you absolutely cannot live without even for the most gung-ho science; it's called gaia.edr3lite.

    But then my impression is that almost everyone wants distances and then hacks something to make Gaia's parallax work for them. That's a bad idea as the SNR goes down to levels very common in the Gaia result catalogue (see 2020arXiv201205220B if you don't take my word for it). Hence, I'm offering a pre-joined view (a virtual table, if you will) with the carefully estimated distances from Coryn Bailer-Jones, and that's this gedr3dist.litewithdist. Whenever you're doing something with eDR3 and distances, this is where I'd point you first.

    Oh, and I should be mentioning that, of course, I figured out what is in dust clouds and what is not with TOPCAT and Aladin as in our tutorial TOPCAT and Aladin working together (which needs a bit of an update, but you'll figure it out).

    There's a lot more fun to be had with this (depending on what you find fun in). What about finding the 10 arcsec-pairs with the least different luminosities (which might actually be useful for testing some optics)? Try this:

    SELECT TOP 300
      a.source_id, partner_id, dist,
      a.phot_g_mean_mag AS source_mag,
      b.phot_g_mean_mag AS partner_mag,
      abs(a.phot_g_mean_mag-b.phot_g_mean_mag) AS magdiff
    FROM gedr3auto.main
      NATURAL JOIN gaia.edr3lite AS a
      JOIN gaia.edr3lite AS b
        ON (partner_id=b.source_id)
    WHERE
      dist BETWEEN 9.999/3600 AND 10.001/3600
      AND a.phot_g_mean_mag IS NOT NULL
      AND b.phot_g_mean_mag IS NOT NULL
    ORDER BY magdiff ASC
    

    – this one takes a bit longer, as there's many 10 arcsec-pairs in eDR3; the query above looks at 84690 of them. Of course, this only returns really faint pairs, and given the errors stars that weak have they're probably not all that equal-luminosity as that. But fixing all that is left as an exercise to the reader. Given there's the RP and BP magnitude columns, what about looking for the most colourful pair with a given separation?

    Acknowledgement: I couldn't have coolly mumbled about Ophiuchus or Taurus without the SCS service ivo://cds.vizier/vi/42 (”Identification of a Constellation From Position, Roman 1982”).

    Update [2021-02-05]: I discovered an extra twist to this story: Voyager 1 is currently flying towards Ophiuchus (or so Wikipedia claims). With an industrial size package of artistic licence you could say: It's coming to keep the loneliest star company. But of course: by the time Voyager will be 150 pc from earth, eDR3 6049144983226879232 will quite certainly have left Ophiuchus (and Voyager will be in a completely different part of our sky, that wouldn't look familar to us at all) – so, I'm afraid apart from a nice conincidence in this very moment (galactically speaking), this whole thing won't be Hollywood material.

  • Crazy Shapes in TAP

    OpenNGC shapes

    A complex shape from OpenNGC: MOCs need not be convex, or simply connected, or anything.

    So far when you did spherical geometry in ADQL, you had points, circles, and polygons as data types, and you could test for intersection and containment as operations. This feature set is a bit unsatisfying because there are no (algebraic) groups in this picture: When you join or intersect two circles, the result only is a circle if one contains the other. With non-intersecting polygons, you will again not have a (simply connected) spherical polygon in the end.

    Enter MOCs (which I've mentioned a few times before on this blog): these are essentially arbitrary shapes on the sky, in practice represented through lists of pixels, cleverly done so they can be sufficiently precise and rather compact at the same time. While MOCs are powerful and surprisingly simple in practice, ADQL doesn't know about them so far, which limits quite a bit what you can do with them. Well, DaCHS would serve them since about 1.3 if you managed to push them into the database, but there were no operations you could do on them.

    Thanks to work done by credativ (who were really nice to work with), funded with some money we had left from our previous e-inf-astro project (BMBF FKZ 05A17VH2) on the pgsphere database extension, this has now changed. At least on the GAVO data center, MOCs are now essentially first-class citizens that you can create, join, and intersect within ADQL, and you can retrieve the results. All operators of DaCHS services are just a few updates away from being able to offer the same.

    So, what can you do? To follow what's below, get a sufficiently new TOPCAT (4.7 will do) and open its TAP client on http://dc.g-vo.org/tap (a.k.a. GAVO DC TAP).

    Basic MOC Operations in TAP

    First, let's make sure you can plot MOCs; run

    SELECT name, deepest_shape
    FROM openngc.shapes
    

    Then do Graphics/Sky Plot, and in the window that pops up then, Layers/Add Area Control. Then select your new table in the Position tab, and finally choose deepest_shape as area (yeah, this could become a bit more automatic and probably will over time). You will then see the footprints of a few NGC objects (OpenNGC's author Mattia Verga hasn't done all yet; he certainly welcomes help on OpenNGC's version control repo), and you can move around in the plot, yielding perhaps something like Fig. 1.

    Now let's color these shapes by object class. If you look, openngc.data has an obj_type column – let's group on it:

    SELECT
      obj_type,
      shape,
      AREA(shape) AS ar
    FROM (
      SELECT obj_type, SUM(deepest_shape) AS shape
      FROM openngc.shapes
      NATURAL JOIN openngc.data
      GROUP BY obj_type) AS q
    

    (the extra subquery is a workaround necessary because the area function wants a geometry or a column reference, and ADQL doesn't allow aggregate functions – like sum – as either of these).

    In the result you will see that so far, contours for about 40 square degrees of star clusters with nebulae have been put in, but only 0.003 square degrees of stellar associations. And you can now plot by the areas covered by the various sorts of objects; in Fig. 2, I've used Subsets/Classify by Column in TOPCAT's Row Subsets to have colours indicate the different object types – a great workaround when one deals with categorial variables in TOPCAT.

    MOCs and JOINs

    Another table that already has MOCs in them is rr.stc_spatial, which has the coverage of VO resources (and is the deeper reason I've been pushing improved MOC support in pgsphere – background); this isn't available for all resources yet , but at least there are about 16000 in already. For instance, here's how to get the coverage of resources talking about planetary nebulae:

    SELECT ivoid, res_title, coverage
    FROM rr.subject_uat
      NATURAL JOIN rr.stc_spatial
      NATURAL JOIN rr.resource
    WHERE uat_concept='planetary-nebulae'
      AND AREA(coverage)<20
    

    (the rr.subject_uat table is a local extension to RegTAP that will be the subject of some future blog post; you could also use rr.res_subject, but because people still use wildly different keyword schemes – if any –, that wouldn't be as much fun). When plotted, that's the left side of Fig. 3. If you do that yourself, you will notice that the resolution here is about one degree, which is a special property of the sort of MOCs I am proposing for the Registry: They are of order 6. Resolution in MOC goes up with order, doubling with every step. Thus MOCs of order 7 have a resolution of about half a degree, MOCs of order 5 a resolution of about two degrees.

    One possible next step is fetch the intersection of each of these coverages with, say, the DFBS (cf. the post on Byurakan spectra). That would look like this:

    SELECT
      ivoid,
      res_title,
      gavo_mocintersect(coverage, dfbscoverage) as ovrlp
    FROM (
      SELECT ivoid, res_title, coverage
      FROM rr.subject_uat
      NATURAL JOIN rr.stc_spatial
      NATURAL JOIN rr.resource
      WHERE uat_concept='planetary-nebulae'
      AND AREA(coverage)<20) AS others
    CROSS JOIN (
      SELECT coverage AS dfbscoverage
      FROM rr.stc_spatial
      WHERE ivoid='ivo://org.gavo.dc/dfbsspec/q/spectra') AS dfbs
    

    (the DFBS' identifier I got with a quick query on WIRR). This uses the gavo_mocintersect user defined function (UDF), which takes two MOCs and returns a MOC of their common pixels. Which is another important part why MOCs are so cool: together with union and intersection, they form groups. It should not come as a surprise that there is also a gavo_mocunion UDF. The sum aggregate function we've used in our grouping above is (conceptually) built on that.

    Planetary Nebula footprint and plate matches

    Fig. 3: Left: The common footprint of VO resources declaring a subject of planetary-nebula (and declaring a footprint). Right bottom: Heidelberg plates intersecting this, and, in blue, level-6 intersections. Above this, an enlarged detail from this plot.

    You can also convert polygons and circles to MOCs using the (still DaCHS-only) MOC constructor. For instance, you could compute the coverage of all resources dealing with planetary nebulae, filtering against obviously over-eager ones by limiting the total area, and then match that against the coverages of images in, say, the Königstuhl plate achives HDAP. Watch this:

    SELECT
      im.*,
      gavo_mocintersect(MOC(6, im.coverage), pn_coverage) as ovrlp
    FROM (
      SELECT SUM(coverage) AS pn_coverage
      FROM rr.subject_uat
      NATURAL JOIN rr.stc_spatial
      WHERE uat_concept='planetary-nebulae'
      AND AREA(coverage)<20) AS c
    JOIN lsw.plates AS im
    ON 1=INTERSECTS(pn_coverage, MOC(6, coverage))
    

    – so, the MOC(order, geo) function should give you a MOC for other geometries. There are limits to this right now because of limitations of the underlying MOC library; in particular, non-convex polygons are not supported right now, and there are precision issue. We hope this will be rectified soon-ish when we base pgsphere's MOC operations on the CDS HEALPix library. Anyway, the result of this is plotted on the right of Fig. 3.

    Open Ends

    In case you have MOCs from the outside, you can also construct MOCs from literals, which happen to be the ASCII MOCs from the standard. This could look like this:

    SELECT TOP 1
      MOC('4/30-33 38 52 7/324-934') AS ar
    FROM tap_schema.tables
    

    For now, you cannot combine MOCs in CONTAINS and INTERSECTS expressions directly; this is mainly because in such an operation, the machine as to decide on the order of the MOC the other geometries are converted to (and computing the predicates between geometry and MOC directly is really painful). This means that if you have a local table with MOCs in a column cmoc that you want to compare against a polygon-valued column coverage in a remote table like this:

    SELECT db.* FROM
      lsw.plates AS db
      JOIN tap_upload.t6
    ON 1=CONTAINS(coverage, cmoc) -- fails!
    

    you will receive a rather scary message of the type “operator does not exist: spoly <@ smoc”. To fix it (until we've worked out how to reasonably let the computer do that), explicitly convert the polygon:

    SELECT db.* FROM
      lsw.plates AS db
      JOIN tap_upload.t6
    ON 1=CONTAINS(MOC(7, coverage), cmoc)
    

    (be stingy when choosing the order here – MOCs that already exist are fast, but making them at high order is expensive).

    Having said all that: what I've written here is bleeding-edge, and it is not standardised yet. I'd wager, though, that we will see MOCs in ADQL relatively soon, and that what we will see will not be too far from this experiment. Well: Some rough edges, I'd hope, will still be smoothed out.

    Getting This on Your Own DaCHS Installation

    If you are running a DaCHS installation, you can contribute to takeup (and if not, you can stop reading here). To do that, you need to upgrade to DaCHS's latest beta (anything newer than 2.1.4 will do) to have the ADQL extension, and, even more importantly, you need to install the postgresql-postgres package from our release repository (that's version 1.1.4 or newer; in a few weeks, getting it from Debian testing would work as well).

    You will probably not get that automatically, because if you followed our normal installation instructions, you will have a package called postgresql-11-pgsphere installed (apologies for this chaos; as ususal, every single step made sense). The upshot is that with our release repo added, sudo apt install postgresql-pgsphere should give you the new code.

    That's not quite enough, though, because you also need to acquaint the database with the new functions. This can only be done with database administrator privileges, which DaCHS by design does not possess. What DaCHS can do is figure out the commands to do that when it is called as dachs upgrade -e. Have a look at the output, and if you are satisfied it is about what to expect, just pipe it into psql as a superuser; in the default installation, dachsroot would be sufficiently privileged. That is:

    dachs upgrade -e | psql gavo   # as dachsroot
    

    If running:

    select top 1 gavo_mocunion(moc('1/3'), moc('2/9'))
    from tap_schema.tables
    

    through your TAP endpoint returns '1/3 2/9', then all is fine. For entertainment, you might also make sure that gavo_mocintersect(moc('1/3'), moc('2/13')) is 2/13 as expected, and that if you intersect with 2/3 you get back an empty string.

    So – let's bring MOCs to ADQL!

Page 1 / 2 »