Articles from Data

  • Gaia DR3 XP Spectra: All Sampled

    Lots of blue crosses and a few red squares plotted over a sky photograph of a star cluster

    Around this time of the year on the northern hemisphere, you can spot the h and χ Persei double star cluster with the naked eye. One part of it, NGC 884 is shown here with LAMOST DR6 low resolution spectra (red squares) and Gaia DR3 XP spectra (blue crosses) overplotted. Given that LAMOST has already been one of the largest collections of spectra on the planet, you can see that there is really a lot of those XP spectra.

    When Gaia DR3 was released in June, I was somewhat disappointed when I realised what it is that they delivered as the BP/RP (or XP for short) spectra. You see, I had expected to see something rather similar to what I have in DFBS: structurally, arrays of a few dozen spectral points, mapping wavelengths to some sort of measure of the flux.

    What really came were, mainly, “continuous spectra“, that is coefficients of Gauss-Hermite polynomials. You can fetch them from the gaiadr3.xp_continuous_mean_spectrum table at the ARI-Gaia TAP service; the blue part of the spectrum of the star DR3 4295806720 looks like this in there:

    102.93398893929992, -12.336921213781045, -2.668856168170544, -0.12631176306793765, -0.9347021092539146, 0.05636787290132809, [...]

    No common spectral client can plot this. The Gaia DPAC has helpfully provided a Python library called GaiaXPy to turn these into “proper” spectra. Shortly after the data release, my plan has thus been to turn all these spectra into their “sampled” form using GaiaXPy and then re-publish them, both through SSAP for ad-hoc discovery and through TAP for (potentially) global analysis.

    Alas, for objects too faint to make it into DR3's xp_sampled_mean_spectrum table (that's 35 million spectra already turned to wavelength-flux pairs by DPAC), the spectra generated in this way looked fairly awful, with lots of very artificial-looking wiggles (“ringing”, if you will). After a bit of deliberation, I realised that when the errors are given on the Hermite coefficients, once you compute the samples, these errors will be liberally distributed among the output samples. In other words, the error on the samples will be grossly correlated over arbitrary distances; at least I am fairly helpless when trying to separate signal from artefact in these beasts.

    Bummer. Well, fortunately, Rene Andrae from “up the mountain” (i.e., the MPI for Astronomy) has worked out a reasonably elegant way to get more conventional spectra understandable to mere humans. Basically, you compute n distinct “realisations” of the error model given by the table of the continuous spectra and average over them. The more samples you take, the less correlated your spectral points and their errors will be and the less confusing the signal will be. The service docs for gaia/s3 give the math.

    Doing this on more than 200 million spectra is quite an effort, though, and so after some experimentation I decided to settle on 10 realisations per spectrum and have relatively wide bins (10 nm) over just the optical part of the spectrum (400 through 800 nm). The BP and RP bandpaths are a bit wider, and there is probably signal blotted out by the wide bins; I will probably be addressing this for DR4, except if these spectra become the smash hit they deserve to be.

    The result of this procedure is now available through an SSAP service that should show up in the VO Registry by the time the first of you read this; the Aladin image above gives you an impression of the density of results here – and don't forget: the spectra with the blue crosses are all reasonably well flux-calibrated.

    The data is also available on the TAP service, which opens up many interesting possibilities. Let me mention two here.

    Comparison with LAMOST

    I was rather nervous whether what I had done resulted in anything that bore even a fleeting resemblance to reality, and so about the first thing I tried was to compare my new data with what LAMOST has.

    That is a nice exercise for TAP and ADQL. Let's first match spectra from the two surveys, which luckily are on the same server, saving us some cross-server uploads. I am selecting a minimum of data, just the position and the two access URLs, and I let DaCHS' MAXREC kick in so I'm just retrieving 20000 of the millions of result records:

    SELECT a.ssa_location, a.accref, b.accref
      gdr3spec.ssameta AS a
      JOIN lamost6.ssa_lrs AS b
      ON DISTANCE(a.ssa_location, b.ssa_location)<0.001

    (this is using the DISTANCE(.,.)<radius idiom that we will be migrating towards in ADQL 2.1 instead of the dreaded 1=CONTAINS(POINT, CIRCLE) thing everyone has loathed in ADQL 2.0).

    Using the nifty activation actions, you can now tell TOPCAT to open the two spectra next to each other when you click on a row or a point in a sky plot. To reproduce,

    1. Make a sky plot. TOPCAT doesn't yet pick up the POINT in ssa_location, so you have to configure the Lon and Lat fields yourself to ssa_location[0] and ssa_location[1].
    2. Open the activation actions, either from the button bar or from the Views menu.
    3. In there, select Plot Table, make sure it says accref in Table Location and then check Plot Table in the Actions pane. When you now click on a point in the sky plot, you should see a spectrum pop up, except it is plotted with dots, which most people consider inappropriate for spectra. Use the Form tab in the plot window to style it a bit more spectrum-like (I recommend looking into Line and XYError).
    4. But how do you now add the LAMOST plot? I don't think TOPCAT's activation actions let you plot right into the plane plot you just configured. But you can add a second Plot Table action from the Actions menu in the window with the activation actions. As before, configure this new item, except this one needs to plot accref_ (which is what DaCHS has called the access reference for LAMOST to keep the names unique).
    5. As for Gaia, configure to plot to look good as a spectrum. In order to make the two spectra optically comparable, under Axes set the range to 4000 to 8000 Angstrom manually here.

    You can now click on points in your sky plot and, after a second or so, see the corresponding spectra next to each other (if you place the two plot windows that way).

    If you try this, you will (hopefully) see that major features of spectra are nicely reproduced, such as with these, I guess, molecular bands:

    Two line plots next to each other, the right one showing more features.  the left one roughly follows the major wiggles, though.

    As you probably have guessed, the extremely low-resolution Gaia XP spectrum is left, LAMOST's (somewhat higher-resolution) low-resolution spectrum is right:

    This also works with absorption in the blue, as in this example:

    Two line plots next to each other, the right one showing a lot of relatively sharp absoprtion lines, which the left one does not have.  A few major bumps are present in both, and the general shape conincides nicely, expect perhaps at the blue edge.

    In case of doubt, I have to say I'd probably trust Gaia's calibration around 400 nm better than LAMOST's. But that's mere guesswork.

    For fainter objects, you will see remnants of the systematic wiggles from the Hermite polynomials:

    Two line plots next to each other.  Both are relatively noisy, in particular on the blue edge.  The left one also seems to have a rather regular oscillation at the blue edge.

    Anyway, if you keep an eye on the errors, you can probably even work with spectra from the fainter objects:

    Two line plots next to each other.  The left one has fairly strong ringing which is not present in the right one, but it mainly stays within the error bars.  The total flux of this star is at least a factor of 10 less than for the prettier examples above.

    Mass Retrieval of Spectra

    One nice thing about the short spectra is that you can fetch many of them in one go and in very little time. For instance, to retrieve particularly red objects from the Gaia catalogue of Nearby Stars (also on the GAVO server) with spectra, say:

      source_id, ra, dec, parallax, phot_g_mean_mag,
      phot_bp_mean_mag, phot_rp_mean_mag, ruwe, adoptedrv,
      flux, flux_error
    FROM gcns.main
    JOIN gdr3spec.spectra
    USING (source_id)
    WHERE phot_rp_mean_mag<phot_bp_mean_mag-4

    [in case you wonder how I quickly got the column names enumerated here: do control-clicks into the Columns pane in TOCPAT's TAP window and then use the Cols button]. For when you do not have Gaia DR3 source_id-s in your source table, there is also gdr3spec.withpos against which you can do more conventional positional crossmatches.

    Within a few seconds, you can retrieve more than 4000 spectra in this way. You can now do whatever analysis you want on these spectra. Or, well, just plot them. The following procedure for that later task uses TOPCAT features only available in the next release, due before mid-October[1].

    First, make a colour-magnitude diagram (CMD) from this table as usual (e.g., BP-RP vs G). Then, open another plane plot and

    1. LayersAdd XYArray Control
    2. Configure the XYArray to plot from the table you just fetched, have nothing in X Values[2] and flux in Y Values.
    3. Under Axes, configure Y Log in order to better show the 4253 spectra at one time.
    4. Throw away or at least uncheck all other layers in the plot.
    5. In order to let TOPCAT highlight the spectrum of the activated source, in the Subsets pane check the Activated subset (that's the bleeding-edge functionality you will not have in older TOPCATs) and give it a sufficiently bright colour.

    With that, you can now click around in your CMD and immediately see that source's spectrum in the context of all the others, like this:

    An animation of someone selecting various points in a CMD and have simulataneous spectra plotted.

    These spectra have also inspired me to design and implement a vector extension for ADQL, which lets you do even more interesting things with these spectra. More on this… soon.

    [1]The Activated subset will be in TOPCAT versions later than 4.8-6, and until then in pre-releases at
    [2]These should be the spectral points; DaCHS does not deliver them with this query because I am a coward. I think I will find my courage relatively soon and then fix this. Once that has happened, you can select param$spectral as X values. [Update: Mark Taylor remarks that by writing sequence(41, 400, 10) in bleeding-edge TOPCATs and add(multiply(10,sequence(41)),400) before that, you can add a proper spectral axis until then]
  • Query the Registry with WIRR

    Search windows of VODesktop and WIRR

    Pixels from venerable VODesktop and WIRR: it's supposed to be about the same thing, except WIRR uses and exposes the latest Registry standards (and then some tech that's not standard yet).

    When the VO was young, there was a programme called VODesktop that had a very nice interface for searching the Registry. Also, it would run queries against the services discovered, giving nice all-VO querying that few modern clients do quite as elegantly. Regrettably, when the astrogrid UK project was de-funded, VODesktop's development ceased in 2010.

    In 2012, it had become clear that nobody would step up to continue it, and I wanted to at least provide a replacement for the Registry interface part. In consequence, Florian Rothmaier and I wrote the Web Interface to the Relational Registry, or WIRR for short; this lets you build Registry queries in your Web Browser in an interface inspired by VODesktop (which, I'm told, in turn was inspired by early iTunes).

    WIRR's sweet spot is between the Registry interfaces in the usual clients (TOPCAT, Aladin: these try to hide the gory details of where their service lists come from and hence are limited in what interaction they allow) and using a TAP client to write and execute RegTAP queries (where there are no limitations beyond the protocol's, but it's tedious unless you happen to know the RegTAP standard by heart).

    In contrast to its model VODesktop, WIRR cannot run any queries against the services discovered using it. But you can transfer the services you have found to clients via SAMP (TOPCAT can handle the relevant MTypes, but I'm frankly not sure what else). Apart from that, an obvious use for WIRR are the queries one needs in VO curation. For instance, I keep linking to it when sending people canned registry queries, as in the section on claiming an authority in the DaCHS Tutorial.

    Given that both Javascript and the Registry have evolved a lot in the past decade, WIRR was in need of a major redecoration for some time now, and in early July, I found some time to do it. The central result is that the code is now halfway modern, strict Javascript; let's see how many web browsers still run that can't execute this.

    On the surface, much less has changed, but there are some news I'd consider noteworthy and that might help your data discovery-fu:

    • Since I've added some constraint types, the constraint type selector is now a hierarchical box, sporting what I think are or should be the most common constraint types (full text, service type and UAT term) on level 0 and then having “Blind Discovery“, “Finer Grained“, and “Special Effects“ as pop-ups; all this so we obey Miller's Rule of Seven.
    • Rather than explain the constraints on a second, separate page, there are now brief help texts coming with each constaint.
    • You can now match against UAT concepts, and there is a completing input box for them; in case you're wondering what this is about, see this post from last February. And yes, next time I'll play with WIRR I'll probably include SemBaReBro here.
    • When constraining by column UCD, you can now choose from UCDs found in the registry (the “Pick one“ button).
    • You can now constrain by spatial, temporal, and spectral coverage, though that's still a gamble because not many (or, actually, very few in the case of temporal and spectral) operators care to declare their services' coverage. When they don't, you won't see their resources with such blind discovery constraints. For some background on this, check Space and Time not lost on the Registry on this blog.
    • There is now a „SQL“ button with successful searches that lets you retrieve the SQL executed for the particular constraint. While that query does not immediately execute on RegTAP services (it's Postgres' SQL rather than ADQL), it ought to give you a head start when transplanting your Registry query into, say, a pyVO-based script.
    • You can now use your browser's back and forward buttons (or, in my case. key bindings) to navigate in your query history.

    What this still doesn't do: Work without Javascript. That's a bit of a disgrace, since after the last changes it would actually be reasonable to provide non-javascript fallbacks for some of the basic functionality (of course, no SAMP at all then…). I'll do it the first time someone asks. Promised.

    A document that now needs at least slight updates because things have moved about a bit is the data discovery use case Florian wrote back then. The updates absolutely necessary are not terribly involved, but I would like to use the opportunity to add a bit more spice to the tutorial. If you have ideas: I'm all ears.

    Oh, and before I close: you can still run VODesktop; kudos to the maintainers of the JVM for that. But it's nevertheless not really usable any more, which perhaps isn't too surprising for a client built on top of experimental online services ten years ago. For one, its TAP client speaks pre-release versions of both TAP and ADQL, so those won't work on modern TAP services (and the ancient ones have vanished). Worse, it needed to use a non-standard extension of RegTAP's predecessor (for those old enough to remember: it used XQuery), and none of the modern searchable registries understands that any more.

    Which is a pity, really. It's been a fine programme. It just was a few years early: By 2012, everything it needed has been defined in nice, stable standards that are still around and probably will be for another decade at least.

  • Tangible Astronomy and Movies with TOPCAT

    This March, I've put up two new VO resources (that's jargon for “table or service or whatever”) that, I think, fit quite well what I like to call tangible astronomy: things you can readily relate to what you see when you step out at night. And, since I'm a professing astronomy nerd, that's always nicely gratifying.

    The two resources are the Constellations as Polygons and the Gaia eDR3 catalogue of nearby stars (GCNS).


    On the constellations, you might rightfully say that's really far from science. But then they do help getting an idea where something is, and when and from where you might see something. I've hence wanted for a long time to re-publish the Davenhall Constellation Boundary Data as proper, ADQL-queriable polygons, and figuring out where the loneliest star in the sky (and Voyager 1) were finally made me do it.

    GCNS density around taurus

    Taurus in the GCNS density plot: with constellations!

    So, since early March there's the cstl.geo table on the TAP service at with the constallation polygons in its p column. Which, for starters, means it's trivial to overplot constallation boundaries in your favourite VO clients now, as in the plot above. To make it, I've just done a boring SELECT * FROM cstl.geo, did the background (a plain HEALPix density plot of GCNS) and, clicked Layers → Add Area Control and selected the cstl.geo table.

    If you want to identify constellations by clicking, while in the area control, choose “add central” from the Forms menu in the Form tab; that's what I did in the figure above to ensure that what we're looking at here is the Hyades and hence Taurus. Admittedly: these “centres“ are – as in the catalogue – just the means of the vertices rather than the centres of mass of the polygon (which are hard to compute). Oh, and: there is also the AreaLabel in the Forms menu, for when you need the identification more than the table highlighting (be sure to use a center anchor here).

    Note that TOPCAT's polygon plot at this point is not really geared towards large polygons (which the constellations are) right now. At the time of writing, the documentation has: “Areas specified in this way are generally intended for displaying relatively small shapes such as instrument footprints. Larger areas may also be specified, but there may be issues with use.” That you'll see at the edges of the sky plots – but keeping that in mind I'd say this is a fun and potentially very useful feature.

    What's a bit worse: You cannot turn the constellation polygons into MOCs yet, because the MOC library currently running within our database will not touch non-convex polygons. We're working on getting that fixed.

    Nearby Stars

    Similarly tangible in my book is the GCNS: nearby stars I always find romantic.

    Let's look at the 100 nearest stars, and let's add spectral types from Henry Draper (cf. my post on Annie Cannon's catalogue) as well as the constellation name:

    WITH nearest AS (
    SELECT TOP 100
      a.ra, a.dec,
    FROM gcns.main AS a
    LEFT OUTER JOIN hdgaia.main AS b
      ON (b.source_id_dr3=a.source_id)
    ORDER BY dist_50 ASC)
    SELECT nearest.*, name
    FROM nearest
    JOIN cstl.geo AS g
      ON (1=CONTAINS(
        POINT(nearest.ra, nearest.dec),

    Note how I'm using CONTAINS with the polygon in the constellations table here; that's the usage I've had in mind for this table (and it's particularly handy with table uploads).

    That I have a Common Table Expression (“WITH”) here is due to SQL planner confusion (I'll post something about that real soon now): With the WITH, the machine first selects the nearest 100 rows and then does the (relatively costly) spatial match, without it, the machine (somewhat surprisingly) did the geometric match first. This particular confusion looks fixable, but for now I'd ask you for forgiveness for the hack – and the technique is often useful anyway.

    If you inspect the result, you will notice that Proxima Cen is right there, but α Cen is missing; without having properly investigated matters, I'd say it's just too bright for the current Gaia data reduction (and quite possibly even for future Gaia analysis).

    Most of the objects on that list that have made it into the HD (i.e., have a spectral type here) are K dwarfs – which is an interesting conspiracy between the limits of the HD (the late red and old white dwarfs are too weak for it) and the limits of Gaia (the few earlier stars within 6 parsec – which includes such luminaries as Sirius at a bit more than 2.5 pc – are just too bright for where Gaia data reduction is now).


    Another fairly tangible thing in the GCNS is the space velcity, given in km/s in the three dimensions U, V, and W. That is, of course, an invitation to look for stellar streams, as, within the relatively small portion of the Milky Way the GCNS looks at, stars on similar orbits will exhibit similar space motions.

    Considering the velocity dispersion within a stellar stream will be a few km/s, let's have the database bin the data. Even though this data is small enough to conveniently handle locally, this kind of remote analysis is half of what TAP is really great at (the other half being the ability to just jump right into a new dataset). You can group by multiple things at the same time:

      COUNT(*) AS n,
      ROUND(uvel_50/5)*5 AS ubin,
      ROUND(vvel_50/5)*5 AS vbin,
      ROUND(wvel_50/5)*5 AS wbin
    FROM gcns.main
    GROUP BY ubin, vbin, wbin

    Note that this (truly) 3D histogram only represents a small minority of the GCNS objects – you need radial velocities for space motion, and these are precious even in the Gaia age.

    What really surprised me is how clumpy this distribution is – are we sure we already know all stellar streams in the solar neighbourhood? Watch for yourself (if your browser can't play webm, complain to your vendor):

    [Update (2021-04-01): Mark Taylor points out that the “flashes” you sometimes see when the grid is aligned with the viewing axes (and the general appearance) could be improved by just pulling all non-NULL UVW values out of the table and using a density plot (perhaps shading=density densemap=inferno densefunc=linear). That is quite certainly true, but it would of course defeat the purpose of having on-server aggregation. Which, again, isn't all that critical for this dataset, so doing the prettier plot actually is a valuable exercise for the reader]

    How did I make this video? Well, I started with a Cube Plot in TOPCAT as usual, configuring weighted plotting with n as its weight and played around a bit with scaling out a few outliers. And then I saved the table (to zw.vot), hit “STILTS“ in the plot window and saved the text from there to a text file, I had to change the ``in`` clause in the script to make it look like this:

    stilts plot2cube \
     xpix=887 ypix=431 \
     xlabel='ubin / km/s' ylabel='vbin / km/s' \
     zlabel='wbin / km/s' \
     xmin=-184.5 xmax=49.5 ymin=-77.6 ymax=57.6 \
     zmin=-119.1 zmax=94.1 phi=-84.27 theta=90.35 \
      psi=-62.21 \
     auxmin=1 auxmax=53.6 \
     auxvisible=true auxlabel=n \
     legend=true \
     layer=Mark \
        in=zw.vot \
        x=ubin y=vbin z=wbin weight=n \
        shading=weighted size=2 color=blue

    – and presto, sh would produce the plot I just had in TOPCAT. This makes a difference because now I can animate this.

    In his documentation, Mark already has a few hints on how to build animations; here are a few more ideas on how to organise this. For instance, if, as I want here, you want to animate more than one variable, stilts tloop may become a bit unwieldy. Here's how to give the camera angles in python:

    import sys
    from astropy import table
    import numpy
    angles = numpy.array(
      [float(a) for a in range(0, 360)])
      names=("psi", "theta")).write(
        sys.stdout, format="votable")

    – the only thing to watch out for is that the names match the names of the arguments in stilts that you want to animate (and yes, the creation of angles will make numpy afficionados shudder – but I wasn't sure if I might want to have somewhat more complex logic there).

    [Update (2021-04-01): Mark Taylor points out that all that Python could simply be replaced with a straightforward piece of stilts using the new loop table scheme in stilts, where you would simply put:

    acmd='addcol phi $1'
    acmd='addcol theta 40+30*cosDeg($1+57)'

    into the plot2cube command line – and you wouldn't even need the shell pipeline.]

    What's left to do is basically the shell script that TOPCAT wrote for me above. In the script below I'm using a little convenience hack to let me quickly switch between screen output and file output: I'm defining a shell variable OUTPUT, and when I un-comment the second OUTPUT, stilts renders to the screen. The other changes versus what TOPCAT gave me are de-dented (and I've deleted the theta and psi parameters from the command line, as I'm now filling them from the little python script):

    OUTPUT="omode=out out=pre-movie.png"
    python3 |\
    stilts plot2cube \
       xpix=500 ypix=500 \
       xlabel='ubin / km/s' ylabel='vbin / km/s' \
       zlabel='wbin / km/s' \
       xmin=-184.5 xmax=49.5 ymin=-77.6 ymax=57.6 \
       zmin=-119.1 zmax=94.1 \
       auxmin=1 auxmax=53.6 \
    phi=8 \
    animate=- \
    afmt=votable \
    $OUTPUT \
       layer=Mark \
          in=zw.vot \
          x=ubin y=vbin z=wbin weight=n \
          shading=weighted size=4 color=blue
    # render to movie with something like
    # ffmpeg -i "pre-movie-%03d.png" -framerate 15 -pix_fmt yuv420p /stream-movie.webm
    # (the yuv420p incantation is so real-world
    # web browsers properly will not go psychedelic
    # with the colours)

    The comment at the end says how to make a proper movie out of the PNGs this produces, using ffmpeg (packaged with every self-respecting distribution these days) and yielding a webm. Yes, going for mpeg x264 might be a lot faster for you as it's a lot more likely to have hardware support, but everything around mpeg is so patent-infested that for the sake of your first-born's soul you probably should steer clear of it.

    Movies are fun in webm, too.

  • The Loneliest Star in the Sky

    sky images and a distribution plot

    The loneliest star in the sky on the left, and on the right a somewhat more lonelier one (it's explained in the text). The inset shows the distribution of the 500 loneliest stars on the whole sky in Galactic coordinates.

    In early December, the object catalogue of Gaia's data release 3 was published (“eDR3“), and I've been busy in various ways on this data off and on since then – see, for instance, the The Case of the disappearing bits on this blog.

    One of the things I have missed when advising people on projects with previous Gaia data releases is a table that, for every object, gives the nearest neighbour. And so for this release I've created it and christened it, perhaps just a bit over-grandiosely, “Gaia eDR3 Autocorrelation”. Technically, it is just a long (1811709771 rows, to be precise) list of pairs of Gaia eDR3 source ids, the ids of their nearest neighbour, and a spherical distance between.

    This kind of data is useful for many applications, mostly when looking for objects that are close together or (more often) things that fail for such close pairs for a wide variety of reasons. I have taken some pains to not only have close neighbours, though, because sometimes you may want specifically objects far away from others.

    As in the case of this article's featured image: The loneliest star in the sky (as seen by Gaia, that is) is eDR3 6049144983226879232, which is 4.3 arcminutes from its neighbour, 6049144021153793024, which in turn is the second-loneliest star in the sky. They are, perhaps a bit surprisingly, in Ophiuchus (and thus fairly close to the Milky Way plane), and (probably) only about 150 parsec from Earth. Doesn't sound too lonely, hm? Turns out: these stars are lonely because dust clouds blot out all their neighbours.

    Rank three is in another dust cloud, this time in Taurus, and so it continues in low Galactic latitude to rank 8 (4402975278134691456) at Galactic latitude 36.79 degrees; visualising the thing, it turns out it's again in a dark cloud. What about rank 23 at 83.92 Galactic (3954600105683842048)? That's probably bona-fide, or at least it doesn't look very dusty in the either DSS or PanSTARRS. Coryn (see below) estimates it's about 1100 parsec away. More than 1 kpc above the galactic disk: that's more what I had expected for lonely stars.

    Looking at the whole distribution of the 500 loneliest stars (inset above), things return a bit more to what I had expected: Most of them are around the galactic poles, where the stellar density is low.

    So: How did I find these objects? Here's the ADQL query I've used:

    SELECT TOP 500
      ra, dec, source_id, phot_g_mean_mag, ruwe,
      partner_id, dist,
      COORD2(gavo_transform('ICRS', 'GALACTIC',
        point(ra, dec))) AS glat
      NATURAL JOIN gedr3auto.main
    ORDER BY dist DESC

    – run this on the TAP server at (don't be shy, it's a cheap query).

    Most of this should be familiar to you if you've worked through the first pages of ADQL course. There's two ADQL things I'd like to advertise while I have your attention:

    1. NATURAL JOIN is like a JOIN USING, except that the database auto-selects what column(s) to join on by matching the columns that have the same name. This is a convenient way to join tables designed to be joined (as they are here). And it probably won't work at all if the tables haven't been designed for that.
    2. The messy stuff with GALACTIC in it. Coordinate transformations had a bad start in ADQL; the original designers hoped they could hide much of this; and it's rarely a good idea in science tools to hide complexity essentially everyone has to deal with. To get back on track in this field, DaCHS servers since about version 1.4 have been offering a user defined function gavo_transfrom that can transform (within reason) between a number of popular reference frames. You will find more on it in the server's capabilities (in TOPCAT: the “service” tab). What is happening in the query is: I'm making a Point out of the RA and Dec given in the catalogue, tell the transform function it's in ICRS and ask it to make Galactic coordinates from it, and then take the second element of the result: the latitude.

    And what about the gedr3dist.litewithdist table? That doesn't look a lot like the gaiaedr3.gaiasource we're supposed to query for eDR3?

    Well, as for DR2, I'm again only carrying a “lite” version of the Gaia catalogue in GAVO's Heidelberg data center, stripped down to the columns you absolutely cannot live without even for the most gung-ho science; it's called gaia.edr3lite.

    But then my impression is that almost everyone wants distances and then hacks something to make Gaia's parallax work for them. That's a bad idea as the SNR goes down to levels very common in the Gaia result catalogue (see 2020arXiv201205220B if you don't take my word for it). Hence, I'm offering a pre-joined view (a virtual table, if you will) with the carefully estimated distances from Coryn Bailer-Jones, and that's this gedr3dist.litewithdist. Whenever you're doing something with eDR3 and distances, this is where I'd point you first.

    Oh, and I should be mentioning that, of course, I figured out what is in dust clouds and what is not with TOPCAT and Aladin as in our tutorial TOPCAT and Aladin working together (which needs a bit of an update, but you'll figure it out).

    There's a lot more fun to be had with this (depending on what you find fun in). What about finding the 10 arcsec-pairs with the least different luminosities (which might actually be useful for testing some optics)? Try this:

    SELECT TOP 300
      a.source_id, partner_id, dist,
      a.phot_g_mean_mag AS source_mag,
      b.phot_g_mean_mag AS partner_mag,
      abs(a.phot_g_mean_mag-b.phot_g_mean_mag) AS magdiff
    FROM gedr3auto.main
      NATURAL JOIN gaia.edr3lite AS a
      JOIN gaia.edr3lite AS b
        ON (partner_id=b.source_id)
      dist BETWEEN 9.999/3600 AND 10.001/3600
      AND a.phot_g_mean_mag IS NOT NULL
      AND b.phot_g_mean_mag IS NOT NULL
    ORDER BY magdiff ASC

    – this one takes a bit longer, as there's many 10 arcsec-pairs in eDR3; the query above looks at 84690 of them. Of course, this only returns really faint pairs, and given the errors stars that weak have they're probably not all that equal-luminosity as that. But fixing all that is left as an exercise to the reader. Given there's the RP and BP magnitude columns, what about looking for the most colourful pair with a given separation?

    Acknowledgement: I couldn't have coolly mumbled about Ophiuchus or Taurus without the SCS service ivo://cds.vizier/vi/42 (”Identification of a Constellation From Position, Roman 1982”).

    Update [2021-02-05]: I discovered an extra twist to this story: Voyager 1 is currently flying towards Ophiuchus (or so Wikipedia claims). With an industrial size package of artistic licence you could say: It's coming to keep the loneliest star company. But of course: by the time Voyager will be 150 pc from earth, eDR3 6049144983226879232 will quite certainly have left Ophiuchus (and Voyager will be in a completely different part of our sky, that wouldn't look familar to us at all) – so, I'm afraid apart from a nice conincidence in this very moment (galactically speaking), this whole thing won't be Hollywood material.

  • The Bochum Galactic Disk Survey

    Patches of higher perceived variability on the Sky

    Fig 1: How our haphazard variability ratio varies over the sky (galactic coordinates). And yes, it's clear that this isn't dominated by physical variability.

    About a year ago, I reported on a workshop on “Large Surveys with Small Telescopes” in Bamberg; at around the same time, I've published an example for those, the Bochum Galactic Disk Survey BGDS, which used a twin 15 cm robotic telescope in some no longer forsaken place in the Andes mountains to monitor the brighter stars in the southern Milky Way. While some tables from an early phase of the survey have been on VizieR for a while, we now publish the source images (also in SIAP and Obscore), the mean photometry (via SCS and TAP) and, perhaps potentially most fun of all, the the lightcurves (via SSAP and TAP) – a whopping 35 million of the latter.

    This means that in tools like Aladin, you can now find such light curves (and images in two bands from a lot of epochs) when you are in the survey's coverage, and you can run TAP queries on GAVO's server against the full photometry table and the time series.

    Regular readers of this blog will not be surprised to see me use this as an excuse to show off a bit of ADQL trickery.

    If you have a look at the bgds.phot_all table in your favourite TAP client, you'll see that it has a column amp, giving the difference between the highest and lowest magnitude. The trouble is that amp for almost all objects just reflects the measurement error rather than any intrinsic variability. To get an idea what's “normal” (based on the fact that essentially all stars have essentially constant luminosity on the range and resolution scales considered here), run a query like:

    SELECT ROUND(amp/err_mag*10)/10 AS bin, COUNT(*) AS n
    FROM bgds.phot_all
    WHERE nobs>10
    GROUP BY bin

    As this scans the entire 75 million rows of the table, you will probably have to use async mode to run this.

    distribution of amplitude/mag errors

    Figure 2: The distribution of amplitude over magnitude error for all BGDS objects with nobs>10 (blue) and the subset with a mean magnitude brighter than 15 (blue).

    When it comes back, you will have, for objects where any sort of statistics make sense at all (hence nobs>10), a histogram (of sorts) of the amplitude in units of upstream's magnitude error estimation. If you log-log-plot this, you'll see something like Figure 2. The curve at least tells you that the magnitude error estimate is not very far off – the peak at about 3 “sigma” is not unreasonable since about half of the objects have nobs of the order of a hundred and thus would likely contain outliers that far out assuming roughly Gaussian errors.

    And if you're doing a rough cutoff at amp/magerr>10, you will get perhaps not necessarily true variables, but, at least potentially interesting objects.

    Let's use this insight to see if we spot any pattern in the distribution of these interesting objects. We'll use the HEALPix technique I've discussed three years ago in this blog, but with a little twist from ADQL 2.1: The Common Table Expressions or CTEs I have already mentioned in my blog post on ADQL 2.1 and then advertised in the piece on the Henry Draper catalogue. The brief idea, again, is that you can write queries and give them a name that you can use elsewhere in the query as if it were an actual table. It's not much different from normal subqueries, but you can re-use CTEs in multiple places in the query (hence the “common”), and it's usually more readable.

    Here, we first create a version of the photometry table that contains HEALPixes and our variability measure, use that to compute two unsophisticated per-HEALPix statistics and eventually join these two to our observable, the ratio of suspected variables to all stars observed (the multiplication with 1.0 is a cheap way to make a float out of a value, which is necessary here because a/b does integer division in ADQL if a and b are both integers):

    WITH photpoints AS (
        amp/err_mag AS redamp,
        ivo_healpix_index(5, ra, dec) AS hpx
      FROM bgds.phot_all
        AND band_name='SDSS i'
        AND mean_mag<16),
    all_objs AS (
      SELECT count(*) AS ct,
        FROM photpoints GROUP BY hpx),
    strong_var AS (
      SELECT COUNT(*) AS ct,
        FROM photpoints
        WHERE redamp>4 AND amp>1 GROUP BY hpx)
      strong_var.ct/(1.0*all_objs.ct) AS obs,
      all_objs.ct AS n,
    FROM strong_var JOIN all_objs USING (hpx)
    WHERE all_objs.ct>20

    If you plot this using TOPCAT's HEALPix thingy and ask it to use Galactic coordinates, you'll end up with something like Figure 1.

    There clearly is some structure, but given that the variables ratio reaches up to 0.2, this is still reflecting instrumental or pipeline effects and thus earthly rather than Astrophysics. And that's going beyond what I'd like to talk about on a VO blog, although I'l take any bet that you will see significant structure in the spatial distribution of the variability ratio at about any magnitude cutoff, since there are a lot of different population mixtures in the survey's footprint.

    Be that as it may, let's have a quick look at the time series. As with the short spectra from Byurakan use case, we've stored the actual time series as arrays in the database (the mjd and mags columns in bgds.ssa_time_series. Unfortunately, since they are a lot less array-like than homogeneous spectra, it's also a lot harder to do interesting things with them without downloading them (I'm grateful for ideas for ADQL functions that will let you do in-DB analysis for such things). Still, you can at least easily download them in bulk and then process them in, say, python to your heart's content. The Byurakan use case should give you a head start there.

    For a quick demo, I couldn't resist checking out objects that Simbad classifies as possible long-period variables (you see, as I write this, the public bohei over Betelgeuse's brief waning is just dying down), and so I queried Simbad for:

    SELECT ra, dec, main_id
    FROM basic
         POINT('', ra, dec),
         POLYGON('', 127, -30, 112, -30, 272, -30, 258, -30))

    (as of this writing, Simbad still needs the ADQL 2.0-compliant first arguments to POINT and POLYGON), where the POLYGON is intended to give the survey's footprint. I obtained that by reading off the coordinates of the corners in my Figure 1 while it was still in TOPCAT. Oh, and I had to shrink it a bit because Simbad (well, the underlying Postgres server, and, more precisely, its pg_sphere extension) doesn't want polygons with edges longer than π. This will soon become less pedestrian: MOCs in relational databases are coming; more on this soon.

    TOPCAT action shot with a light curve display

    Fig 3: V566 Pup's BGDS lightcuve in a TOPCAT configured to auto-plot the light curves associated with a row from the bgds.ssa_time_series table on the GAVO DC TAP service.

    If you now do the usual spiel with an upload crossmatch to the bgds.ssa_time_series table and check “Plot Table” in Views/Activation Action, you can quickly page through the light curves (TOPCAT will keep the plot style as you go from dataset to dataset, so it's worth configuring the lines and the error bars). Which could bring you to something like Fig. 3; and that would suggest that V* V566 Pup isn't really long-period unless the errors are grossly off.

Page 1 / 3 »