• Query the Registry with WIRR

    Search windows of VODesktop and WIRR

    Pixels from venerable VODesktop and WIRR: it's supposed to be about the same thing, except WIRR uses and exposes the latest Registry standards (and then some tech that's not standard yet).

    When the VO was young, there was a programme called VODesktop that had a very nice interface for searching the Registry. Also, it would run queries against the services discovered, giving nice all-VO querying that few modern clients do quite as elegantly. Regrettably, when the astrogrid UK project was de-funded, VODesktop's development ceased in 2010.

    In 2012, it had become clear that nobody would step up to continue it, and I wanted to at least provide a replacement for the Registry interface part. In consequence, Florian Rothmaier and I wrote the Web Interface to the Relational Registry, or WIRR for short; this lets you build Registry queries in your Web Browser in an interface inspired by VODesktop (which, I'm told, in turn was inspired by early iTunes).

    WIRR's sweet spot is between the Registry interfaces in the usual clients (TOPCAT, Aladin: these try to hide the gory details of where their service lists come from and hence are limited in what interaction they allow) and using a TAP client to write and execute RegTAP queries (where there are no limitations beyond the protocol's, but it's tedious unless you happen to know the RegTAP standard by heart).

    In contrast to its model VODesktop, WIRR cannot run any queries against the services discovered using it. But you can transfer the services you have found to clients via SAMP (TOPCAT can handle the relevant MTypes, but I'm frankly not sure what else). Apart from that, an obvious use for WIRR are the queries one needs in VO curation. For instance, I keep linking to it when sending people canned registry queries, as in the section on claiming an authority in the DaCHS Tutorial.

    Given that both Javascript and the Registry have evolved a lot in the past decade, WIRR was in need of a major redecoration for some time now, and in early July, I found some time to do it. The central result is that the code is now halfway modern, strict Javascript; let's see how many web browsers still run that can't execute this.

    On the surface, much less has changed, but there are some news I'd consider noteworthy and that might help your data discovery-fu:

    • Since I've added some constraint types, the constraint type selector is now a hierarchical box, sporting what I think are or should be the most common constraint types (full text, service type and UAT term) on level 0 and then having “Blind Discovery“, “Finer Grained“, and “Special Effects“ as pop-ups; all this so we obey Miller's Rule of Seven.
    • Rather than explain the constraints on a second, separate page, there are now brief help texts coming with each constaint.
    • You can now match against UAT concepts, and there is a completing input box for them; in case you're wondering what this is about, see this post from last February. And yes, next time I'll play with WIRR I'll probably include SemBaReBro here.
    • When constraining by column UCD, you can now choose from UCDs found in the registry (the “Pick one“ button).
    • You can now constrain by spatial, temporal, and spectral coverage, though that's still a gamble because not many (or, actually, very few in the case of temporal and spectral) operators care to declare their services' coverage. When they don't, you won't see their resources with such blind discovery constraints. For some background on this, check Space and Time not lost on the Registry on this blog.
    • There is now a „SQL“ button with successful searches that lets you retrieve the SQL executed for the particular constraint. While that query does not immediately execute on RegTAP services (it's Postgres' SQL rather than ADQL), it ought to give you a head start when transplanting your Registry query into, say, a pyVO-based script.
    • You can now use your browser's back and forward buttons (or, in my case. key bindings) to navigate in your query history.

    What this still doesn't do: Work without Javascript. That's a bit of a disgrace, since after the last changes it would actually be reasonable to provide non-javascript fallbacks for some of the basic functionality (of course, no SAMP at all then…). I'll do it the first time someone asks. Promised.

    A document that now needs at least slight updates because things have moved about a bit is the data discovery use case Florian wrote back then. The updates absolutely necessary are not terribly involved, but I would like to use the opportunity to add a bit more spice to the tutorial. If you have ideas: I'm all ears.

    Oh, and before I close: you can still run VODesktop; kudos to the maintainers of the JVM for that. But it's nevertheless not really usable any more, which perhaps isn't too surprising for a client built on top of experimental online services ten years ago. For one, its TAP client speaks pre-release versions of both TAP and ADQL, so those won't work on modern TAP services (and the ancient ones have vanished). Worse, it needed to use a non-standard extension of RegTAP's predecessor (for those old enough to remember: it used XQuery), and none of the modern searchable registries understands that any more.

    Which is a pity, really. It's been a fine programme. It just was a few years early: By 2012, everything it needed has been defined in nice, stable standards that are still around and probably will be for another decade at least.

  • DaCHS 2.4 is out: Blind discovery, pretty datalink, and more

    DaCHS screenshots and logo

    DaCHS 2.4: automatic ranges (with registry support!), pretty datalink (with vocabulary support!). And then the usual bunch of improvements (hopefully!).

    I have released DaCHS 2.4 today, and as usual for stable releases, I would like to have something like a commented changelog here so DaCHS deployers perhaps look forward to upgrading – which would be good, because there are far too many outdated DaCHSes out there.

    Among the more notable changes in version 2.4 are:

    Blind discovery overhaul. If you've been following my requests to include coverage metadata three years ago, you have probably felt that the way DaCHS started to hack your RDs to include the metadata it had obtained from the data was a bit odd. Well, it was. DaCHS no longer does that when running dachs limits. While you can still do manual overrides, all the statistics gathered by DaCHS is now kept in the database and injected into the DaCHS' internal idea of your RDs at loading time.

    I have not only changed this because the old way really sucked; it was also necessary because I wanted to have per-column metadata routinely, and since in advanced DaCHS there often are no XML literals for columns (because of active tags), there wouldn't be a place to keep information like what a column is minimally, maximally, in median, or as a “2σ range“ within the RD itself. A longer treatment of where this is going is given in the IVOA note Blind Discovery 2: Advanced Column Statistics that Grégory and I have recently uploaded.

    For you, it's easy: Just run dachs limits q once you're happy with your data, or perhaps once a month for living data, and leave the rest to DaCHS. A fringe benefit: in browser froms, there are now value ranges of the various numeric constraints as placeholders (that's the screenshot on the left in the title picture).

    There is a slight downside: As part of this overhaul, DaCHS is now computing the coverage of SIAP and SSAP services based on the footprints of the products as MOCs. While that gives much more precise service footprints, it only works with bleeding-edge pgsphere as delivered in Debian bullseye – or from our Debian repository. If you want to build this from source, you need to get credativ's pgsphere fork for now.

    Generate column elements: If you have tables with many columns, even just lexically entering the <column> elements becomes straining. That is particularly annoying if there already is a halfway machine-readable representation of that data.

    To alleviate that, very early in the development of DaCHS, I had the gavo mkrd subcommand that you could feed FITS images or VOTables to get template RDs. For a number of reasons, that never worked well enough to make me like or advertise it, and I eventually ended up writing dachs start instead, which is something I like and advertise for general usage.

    However, what that doesn't do is come up with the column declarations. To make good on this, there is now a dachs gencol command that will, from a FITS binary table, a VOTable, or a VizieR-style byte-by-byte description, generate columns with as much metadata as it can fathom. Paste that into the output of dachs start, and, depending on your input format, you should have a quick start on a fairly full-featured data collection (also note there's dachs adm suggestucds for another command that may help quickly generate rich metadata).

    This currently doesn't work for products (i.e., tables of spectra, images, and the like); at least for FITS arrays, I suppose turning their non-obvious header cards into columns might save some work. Let's see: your feedback is welcome.

    Refurbished Datalink XSLT: Since the dawn of datalink, DaCHS has delivered Datalink documents with XSLT stylesheets in order to have nicely formatted pages rather than wild XML when web browsers chance on datalink documents. I have overhauled the Javascript part of this (which, I have to admit, is what makes it pretty). For one, the spatial cutout now works again, and it's modeless (no clicking “edit“ any more before you can drag cutout vertices). I'm also using the datalink/core vocabulary to furnish link groups with proper titles and descriptions, and to have them sorted in in a proper result tree. I've talked about it at the interop, and I've prepared a showcase of various datalink documents in the Heidelberg data centre.

    Update to DaCHS 2.4 and you'll get the same thing for your datalinks.

    Non-product datalinks: When writing a datalink service, you have to first come up with a descriptor generator. DaCHS will provide a simple one for you (or perhaps a bit more complex ones for FITS images or spectra) – but all of these assume that whatever the datalink ID parameter references is in DaCHS' product table. It turned out that in many interesting cases – for instance, attaching time series to object catalogues – that is not the case, and then you had to write rather obscure code to keep DaCHS from poking around in the product table.

    No longer: There is now the //datalink#fromtable descriptor generator. Just fill in which column contains the identifier and the name of the table containing that column and you're (basically) done. Your descriptor will then have a metadata attribute containing the relevant row – along with everything else DaCHS expects from a datalink descriptor.

    gavo_specconv: That's a longer story covered previously on this blog.

    Index declaration in views: Saying on which columns a database index exists allows users to write smart queries, and DaCHS uses such information internally when rewriting geometrical expressions from ADQL to whatever is in use in the actual database. Hence, making sure these indexes are properly declared is important. But at the same time it's difficult for views, because postgres doesn't let you have indexes on views (for good reasons). Still, queries against views will (usually) use indexes of their underlying tables, and hence those should be declared in the corresponding metadata.

    This is tedious in general. DaCHS now helps you with the //procs#declare-indexes-from stream. Essentially, it will compare the columns in the view with the ones from the source tables and then guess which view columns correspond to indexed columns from the source tables; using that, it adds indexed flags to some view columns.

    If all this is too weird for you: Thanks to declare-indexes-from, the index declaration now automatically happens in the modern way to build SSAP services, the //ssap#view mixin. Hence, chances are you won't even see this particular STREAM but just notice its beneficial consequences.

    Sunsetting resources: I've been fiddling off and on with a smart way to pull resources I no longer want to maintain while still leaving a tombstone. I had to re-visit this problem recently because I dropped the Gaia DR1 table from my Heidelberg data centre. So, how do I explain to people why the thing that's been there no longer is?

    In general, this is a rather untractable problem; for instance, it's very hard to do something sensible with the TAP_SCHEMA entries or the VOSI tables endpoints for the tables that went away. Pure web pages, on the other hand, can be adorned with helpful info. To enable that, there is now the superseded meta item, which you define in the RD that once held the resources. For Gaia DR1, here's what I used:

    <meta name="superseded" format="rst">
      We do not publish Gaia DR1 data here any more.
      If you actually need DR1 data, refer to the
      full Gaia mirrors, for instance `the one at
      ARI`_.  Otherwise, please use more recent data
      releases, for instance `eDR3`_.
      .. _the one at ARI: http://gaia.ari.uni-heidelberg.de
      .. _eDR3: /browse/gaia/q3

    Root page template: I slightly streamlined the default root page template, in particular dropping the "i" and "Q" icons for going to the metadata and querying the service. If you have overridden the root template, you may want to see if you want to merge the changes.

    As usual, there are many more small repairs and additions, but most of these are either very minor or rather technical. One last thing, though: DaCHS now works with Python 3.8 (3.7 will continue to be supported for a few years at least, earlier 3.x never was), which is going to be the python3 in Debian bullseye. Bullseye itself will only have DaCHS 2.3 (with the Python 3.8 fixes backported), though. Once bullseye has become stable, we will look into putting DaCHS 2.4 into the backports.

  • GAVO at the Northern Spring Interop 2021

    As usual in May, the people making the Virtual Observatory happen meet for their Interoperability Conference, better known as the Interop – where “meet” still has to be taken with a generous helping of salt (more on this near the end of this post). As has become customary on this blog, let me briefly discuss contributions with a significant involvement of GAVO.

    A major thing from my perspective actually happened in the run-up: The IVOA executive committee (“Exec“) approved Version 2.0 of Vocabularies in the VO, a standard saying how hierarchical word lists (“vocabularies“) can be managed, disseminated, and consumed within the VO. Developing the main ideas from sufficiently restricting RDF to coming up with desise (which makes complicated things possible with surprisingly little code), and trying things out on our growing number of vocabularies took up quite a bit of my standards time in the last 20 months or so – and I'm fairly happy with the outcome, which I celebrated with a brief talk on programming with IVOA semantics during Wednesday morning's semantics session.

    In that session I gave a second, more discussion-oriented, talk, probing how to formalise data product types – which is surprisingly involved, even with the relatively straightforward use case “figure out a programme to handle the data“: What's a spectrum? Well, something that maps a spectral coordinate to... hm. Is it still a spectrum if there's multiple sorts values (perhaps flux, magnitude, and polarisation)? If we allow, in effect, tuples, why not whole images, which would make spectral cubes spectra – but of course few client programmes that deal with spectra do anything useful with cubes, so clearly such a definition would kill our use case. And what about slit spectra, mapping a spatial coordinat to spectra?

    All this of course is reminiscent of the classical problems of semantics: An elephant is a big animal with a trunk. But when an elephant loses its trunk in an accident: does it stop being an elephant? So, much of the art here is finding the sweet spot of usability between strict and formal semantics (that will never fit the real world) and just tossing around loosely defined strings (that will simply not be machine-readable). After the session, I came up with the 2021-05-26 draft of product-type. If you read this a few years down the road, it might be interesting to compare with what product-type is today. I'm curious myself.

    Later on Wednesday CET, I did a shameless plug for my Datalink-transforming XSLT (apologies for a github link, but I'm fishing for PRs here; if you use DaCHS, you'll get the updated stuff with version 2.4, due soon). The core of this dates back to the dawn of datalink, but with a new graphical cutout code and in particular vocabulary-based tree-ification of the result rows, I figured it's time to remind the operators of datalink services it's still out there for them to take up. Perhaps more than from the slides, you can see what I am after here by just trying the Datalink examples I've collected for this talk and comparing document source, the appearance without Javascript (pure XSLT) and the appearance with Javascript (I'm a bit ashamed I'm relying so heavily on it, but much of this really can only be done client-side).

    Quite a bit after midnight my time (still Thursday UTC), Mark Taylor talked about Software Identification, something I've been working on with him recently. It's is one of the things that is short and trivial but that, when unregulated, just doesn't work; in this case it's servers and clients saying what they are when they speak HTTP. I stumbled into the problem while trying to locate severely outdated DaCHS installations – so, I a way I put effort into the Note Mark was talking about (and which I have just uploaded to the IVOA Document Repository) as a sort of penance.

    While I was already asleep when Mark gave his talk, I was back at the Interop Friday morning CEST, when Hendrik Heinl talked about the LOFAR TAP service (which, I'm proud to say, runs on top of DaCHS); this was mainly live operations in TOPCAT (which is why there's no exciting slides), but Hendrik used a pyVO script doing cutouts in an (optical) mosaic of the Fornax cluster built on top of – and that's the main point – Datalink and SODA. Working this out with Hendrik made me realise the documentation of Datalink in pyVO really needs… love. Or, better, work.

    Later on Friday, there was the Registry session, where I gave brief (and somewhat cramped) talks on advanced column metadata (which is intended to one day let you query the registry for things like “roughly complete to 18 mag” or “having objects out to redshift 4“) and how to put VODataService 1.2 coverage into RegTAP – I expect you'll read more on both topics on this blog as they mature to a level at which this can leave the Registry nerd circles.

    And now, about 10 pm on Friday, the meeting is slowly winding down; beyond all the talks (which were, regrettably for a free software spirit like me, on zoom), the real bonus was that there was a gather.town attached to the conference. Now, that's a closed, proprietary, non-self-hostable platform, too, and so I have all reason to grumble. But: for the first time since February 2020 it felt like a conference, with the most useful action happening outside of the lecture halls, from trying to reach consensus on VEP-006 to teaching DaCHS datalink service declaration to learning about working with visibilities coming from VLBI (where it's even more difficult than it is with the big antenna arrays). So… this one time I've made my peace with proprietary platforms.

    A propos of “say no to platforms“ (in this case, slack): Due to the recent troubles with freenode, in addition to the Interop last week saw the the GAVO IRC channel move to libera.chat (where it's still #gavo). So, for instant messaging us now that the Interop is (in effect) over: Come there.

  • Spectral Units in ADQL

    math formulae.

    In case you find the piece of Python given below too hard to read: It's just this table of conversion expressions between the different SI units we are dealing with here.

    Astronomers these days work all along the electromagnetic spectrum (and beyond, of course). Depending on where they observe, they will have very different instrumentation, and hence some see their messengers very naturally as waves, others quite as naturally as particles, others just as electrons flowing out of a CCD that is sitting behind a filter.

    In consequence, when people say where in the spectrum they are, they use very different notions. A radio astronomer will say “I'm observing at 21 cm” or “at 50 GHz“. There's an entire field named after a wavelength, “submillimeter“, and blueward of that people give their bands in micrometers. Optical astronomers can't be cured of their Ångström habit. Going still more high-energy, after an island of nanometers in the UV you end up in the realm of keV in X-ray, and then MeV, GeV, TeV and even EeV.

    However, there is just one VO (or at least that's where we want to go). Historically, the VO has had a slant towards optical astronomy, which gives us the legacy of having wavelengths in far too many places, including Obscore. Retrospectively, this was an unfortunate choice not only because it makes us look optical bigots, but in particular because in contrast to energy and, by ν = E/h, frequency, messenger wavelength depends on the medium you work in, and I shudder to think how many wavelengths in my data center actually are air wavelengths rather than vacuum wavelengths. Also, as you go beyond photons, energy really is the only thing that reasonably characterises all messengers alike (well, even that still isn't quite settled for gravitational waves as long as we're not done with a quantum theory of gravitation).

    Well – the wavelength milk is spilled. Still, the VO has been boldly expanding its reach beyond the optical and infrared windows (recently, with neutrinos and gravitational waves, not to mention EPN-TAP's in-situ measurements in the solar system, even beyond the electromagnetic spectrum). Which means we will have to accomodate the various customs regarding spectral units described above. Where there are “thick” user interfaces, these can care about that. For instance, my datalink XSLT and javascript lets people constrain spectral cutouts (along BAND) in a variety of units (Example).

    But what if the UI is as shallow as it is in ADQL, where you deal with whatever is in the underlying database tables? This has come up again at last week's EuroVO Technology Forum in virtual Strasbourg in the context of making Obscore more attractive to radio astronomers. And thus I've sat down and taught DaCHS a new user defined function to address just that.

    Up front: When you read this in 2022 or beyond and everything has panned out, the function might be called ivo_specconv already, and perhaps the arguments have changed slightly. I hope I'll remember to update this post accordingly. If not, please poke me to do so.

    The function I'm proposing is, mainly, gavo_specconv(expr, target_unit). All it does is convert the SQL expression expr to the (spectral) target_unit if it knows how to do that (i.e., if the expression's unit and the target unit are spectral units properly written in VOUnit) and raise an error otherwise.

    So, you can now post:

    SELECT TOP 5 gavo_specconv(em_min, 'GHz') AS nu
    FROM ivoa.obscore
    WHERE gavo_specconv((em_min+em_max)/2, 'GHz')
        BETWEEN 1 AND 2
      AND obs_collection='VLBA LH sources'

    to the TAP service at http://dc.g-vo.org/tap. You will get your result in GHz, and you write your constraint in GHz, too. Oh, and see below on the ugly constraint on obs_collection.

    Similarly, an X-ray astronomer would say, perhaps:

    SELECT TOP 5 access_url, gavo_specconv(em_min, 'keV') AS energy
    FROM ivoa.obscore
    WHERE gavo_specconv((em_min+em_max)/2, 'keV')
      BETWEEN 0.5 AND 2
      AND obs_collection='RASS'

    This works because the ADQL translator can figure out the unit of its first argument. But, perhaps regrettably, ADQL has no notion of literals with units, and so there is no way to meaningfully say the equivalent of gavo_specconv(656, 'Hz') to get Hα in Hz, and you will receive a (hopefully helpful) error message if you try that.

    However, this functionality is highly desirable not the least because the queries above are fairly inefficient. That's why I added the funny constraints on the collection: without them, the queries will take perhaps half a minute and thus require async operation on my box.

    The (fundamental) reason for that is that postgres is not smart enough to work out it could be using an index on em_min and em_max if it sees something like nu between 3e8/em_min and 3e7/em_max by re-writing the constraint into 3e8/nu between em_min and em_max (and think really hard about whether this is equivalent in the presence of NULLs). To be sure, I will not teach that to my translation layer either. Not using indexes, however, is a recipe for slow queries when the obscore table you query has about 85 million rows (hi there in 2050: yes, that was a sizable table in our day).

    To let users fix what's too hard for postgres (or, for that matter, the translation engine when it cannot figure out units), there is a second form of gavo_specconv that takes a third argument: gavo_specconv(expr, unit_of_expr, target_unit). With that, you can write queries like:

    SELECT TOP 5 gavo_specconv(em_min, 'Angstrom') AS nu
    FROM ivoa.obscore
    WHERE gavo_specconv(5000, 'Angstrom', 'm')
      BETWEEN em_min AND em_max

    and hope the planner will use indexes. Full disclosure: Right now, I don't have indexes on the spectral limits of all tables contributing to my obscore table, so this particular query only looks fast because it's easy to find five datasets covering 500 nm – but that's an oversight I'll fix soon.

    Of course, to make this functionality useful in practice, it needs to be available on all obscore services (say) – only then can people run all-VO obscore searches without the optical bias. The next step (before Bambi-eyeing the TAP implementors) therefore would be to get it into the catalogue of ADQL user defined functions.

    For this, one would need to specify a bit more carefully what units must minimally be supported. In DaCHS, I have built this on a full implementation of VOUnits, which means you can query using attoparsecs of wavelength and get your result in dekaerg (which is a microjoule: 1 daerg = 1 uJ in VOUnits – don't you just love this?):

    SELECT gavo_specconv(
      (spectral_start+spectral_end)/2, 'daerg')
      AS energy
    FROM rr.stc_spectral
    WHERE gavo_specconv(0.0002, 'apc', 'J')
      BETWEEN spectral_start AND spectral_end

    (stop computing: an attoparsec is about 3 cm). This, incidentally, queries the draft RegTAP extension for the VODataService 1.2 coverage in space, time, and spectrum, which is another reason I'm proposing this function: I'm not quite sure how well my rationale that using Joules of energy is equally inconvenient for all communities will be generally received. The real rationale – that Joule is the SI unit for energy – I don't dare bring forward in the first place.

    Playing with wavelengths in AU (you can do that, too; note, though, that VOUnit forbids prefixes on AU, so don't even try mAU) is perhaps entertaining in a slightly twisted way, but admittedly poses a bit of a challenge in implementation when one does not have full VOUnits available. I'm currently thinking that m, nm, Angstrom, MHz, GHz, keV and MeV (ach! No Joule! But no erg, either!) plus whatever spectral units are in use in the local tables would about cover our use cases. But I'd be curious what other people think.

    Since I found the implementation of this a bit more challenging than I had at first expected, let me say a few words on how the underlying code works; I guess you can stop reading here unless you are planning to implement something like this.

    The fundamental trouble is that spectral conversions are non-linear. That means that what I do for ADQL's IN_UNIT – just compute a conversion factor and then multiply that to whatever expression is in its first argument – will not work. Instead, one has to write a new expression. And building these expressions becomes involved because there are thousands of possible combinations of input and output units.

    What I ended up doing is adopting standard (i.e., SI) units for energy (J), wavelength (m), and frequency (Hz) as common bases, and then first convert the source and target units to the applicable standard unit. This entails trying to convert each input unit to each standard unit until a conversion actually works, which in DaCHS' Python looks like this:

    def toStdUnit(fromUnit):
        for stdUnit in ["J", "Hz", "m"]:
                 factor = base.computeConversionFactor(
                     fromUnit, stdUnit)
            except base.IncompatibleUnits:
            return stdUnit, factor
        raise common.UfuncError(
            f"specconv: {fromUnit} is not a spectral unit understood here")

    The VOUnits code is hidden away in base.computeConversionFactor, which raises an IncompatibleUnits when a conversion is impossible; hence, in the end, as a by-product this function also determines what kind of spectral value (energy, frequency, or wavelength) I am dealing with.

    That accomplished, all I need to do is look up the conversions between the basic units, which can be done in a single dictionary mapping pairs of standard units to the conversion expression templates. I have not tried to make these templates particularly pretty, but if you squint, you can still, I hope, figure out this is actually what the opening image shows:

        ("J", "m"): "h*c/(({expr})*{f})",
        ("J", "Hz"): "({expr})*{f}/h",
        ("J", "J"): "({expr})*{f}",
        ("Hz", "m"): "c/({expr})/{f}",
        ("Hz", "Hz"): "{f}*({expr})",
        ("Hz", "J"): "h*{f}*({expr})",
        ("m", "m"): "{f}*({expr})",
        ("m", "Hz"): "c/({expr})/{f}",
        ("m", "J"): "h*c/({expr})/{f}",}

    expr is (conceptually) replaced by the first argument of the UDF, and f is the conversion factor between the input unit and the unit expr is in. Note that thankfully, not additive operators are involved and thus all this is numerically well-conditioned. Hence, I can afford not attempting to simplify any of the expressions involved.

    The rest is essentially book-keeping, where I'm using the ADQL parser to turn the expression into a tree fragment and then fiddling in the tree fragment for expr into that. The result then replaces the UDF function call in the syntax tree. You can review all this in context in DaCHS' ufunctions.py, starting at the definition of toStdUnit.

    Sure: this is no Turing award material. But perhaps these notes are useful when people want to put this kind of thing into their ADQL engines. Which I'd consider a Really Good Thing™.

  • Tangible Astronomy and Movies with TOPCAT

    This March, I've put up two new VO resources (that's jargon for “table or service or whatever”) that, I think, fit quite well what I like to call tangible astronomy: things you can readily relate to what you see when you step out at night. And, since I'm a professing astronomy nerd, that's always nicely gratifying.

    The two resources are the Constellations as Polygons and the Gaia eDR3 catalogue of nearby stars (GCNS).


    On the constellations, you might rightfully say that's really far from science. But then they do help getting an idea where something is, and when and from where you might see something. I've hence wanted for a long time to re-publish the Davenhall Constellation Boundary Data as proper, ADQL-queriable polygons, and figuring out where the loneliest star in the sky (and Voyager 1) were finally made me do it.

    GCNS density around taurus

    Taurus in the GCNS density plot: with constellations!

    So, since early March there's the cstl.geo table on the TAP service at https://dc.g-vo.org/tap with the constallation polygons in its p column. Which, for starters, means it's trivial to overplot constallation boundaries in your favourite VO clients now, as in the plot above. To make it, I've just done a boring SELECT * FROM cstl.geo, did the background (a plain HEALPix density plot of GCNS) and, clicked Layers → Add Area Control and selected the cstl.geo table.

    If you want to identify constellations by clicking, while in the area control, choose “add central” from the Forms menu in the Form tab; that's what I did in the figure above to ensure that what we're looking at here is the Hyades and hence Taurus. Admittedly: these “centres“ are – as in the catalogue – just the means of the vertices rather than the centres of mass of the polygon (which are hard to compute). Oh, and: there is also the AreaLabel in the Forms menu, for when you need the identification more than the table highlighting (be sure to use a center anchor here).

    Note that TOPCAT's polygon plot at this point is not really geared towards large polygons (which the constellations are) right now. At the time of writing, the documentation has: “Areas specified in this way are generally intended for displaying relatively small shapes such as instrument footprints. Larger areas may also be specified, but there may be issues with use.” That you'll see at the edges of the sky plots – but keeping that in mind I'd say this is a fun and potentially very useful feature.

    What's a bit worse: You cannot turn the constellation polygons into MOCs yet, because the MOC library currently running within our database will not touch non-convex polygons. We're working on getting that fixed.

    Nearby Stars

    Similarly tangible in my book is the GCNS: nearby stars I always find romantic.

    Let's look at the 100 nearest stars, and let's add spectral types from Henry Draper (cf. my post on Annie Cannon's catalogue) as well as the constellation name:

    WITH nearest AS (
    SELECT TOP 100
      a.ra, a.dec,
    FROM gcns.main AS a
    LEFT OUTER JOIN hdgaia.main AS b
      ON (b.source_id_dr3=a.source_id)
    ORDER BY dist_50 ASC)
    SELECT nearest.*, name
    FROM nearest
    JOIN cstl.geo AS g
      ON (1=CONTAINS(
        POINT(nearest.ra, nearest.dec),

    Note how I'm using CONTAINS with the polygon in the constellations table here; that's the usage I've had in mind for this table (and it's particularly handy with table uploads).

    That I have a Common Table Expression (“WITH”) here is due to SQL planner confusion (I'll post something about that real soon now): With the WITH, the machine first selects the nearest 100 rows and then does the (relatively costly) spatial match, without it, the machine (somewhat surprisingly) did the geometric match first. This particular confusion looks fixable, but for now I'd ask you for forgiveness for the hack – and the technique is often useful anyway.

    If you inspect the result, you will notice that Proxima Cen is right there, but α Cen is missing; without having properly investigated matters, I'd say it's just too bright for the current Gaia data reduction (and quite possibly even for future Gaia analysis).

    Most of the objects on that list that have made it into the HD (i.e., have a spectral type here) are K dwarfs – which is an interesting conspiracy between the limits of the HD (the late red and old white dwarfs are too weak for it) and the limits of Gaia (the few earlier stars within 6 parsec – which includes such luminaries as Sirius at a bit more than 2.5 pc – are just too bright for where Gaia data reduction is now).


    Another fairly tangible thing in the GCNS is the space velcity, given in km/s in the three dimensions U, V, and W. That is, of course, an invitation to look for stellar streams, as, within the relatively small portion of the Milky Way the GCNS looks at, stars on similar orbits will exhibit similar space motions.

    Considering the velocity dispersion within a stellar stream will be a few km/s, let's have the database bin the data. Even though this data is small enough to conveniently handle locally, this kind of remote analysis is half of what TAP is really great at (the other half being the ability to just jump right into a new dataset). You can group by multiple things at the same time:

      COUNT(*) AS n,
      ROUND(uvel_50/5)*5 AS ubin,
      ROUND(vvel_50/5)*5 AS vbin,
      ROUND(wvel_50/5)*5 AS wbin
    FROM gcns.main
    GROUP BY ubin, vbin, wbin

    Note that this (truly) 3D histogram only represents a small minority of the GCNS objects – you need radial velocities for space motion, and these are precious even in the Gaia age.

    What really surprised me is how clumpy this distribution is – are we sure we already know all stellar streams in the solar neighbourhood? Watch for yourself (if your browser can't play webm, complain to your vendor):

    [Update (2021-04-01): Mark Taylor points out that the “flashes” you sometimes see when the grid is aligned with the viewing axes (and the general appearance) could be improved by just pulling all non-NULL UVW values out of the table and using a density plot (perhaps shading=density densemap=inferno densefunc=linear). That is quite certainly true, but it would of course defeat the purpose of having on-server aggregation. Which, again, isn't all that critical for this dataset, so doing the prettier plot actually is a valuable exercise for the reader]

    How did I make this video? Well, I started with a Cube Plot in TOPCAT as usual, configuring weighted plotting with n as its weight and played around a bit with scaling out a few outliers. And then I saved the table (to zw.vot), hit “STILTS“ in the plot window and saved the text from there to a text file, zw.sh. I had to change the ``in`` clause in the script to make it look like this:

    stilts plot2cube \
     xpix=887 ypix=431 \
     xlabel='ubin / km/s' ylabel='vbin / km/s' \
     zlabel='wbin / km/s' \
     xmin=-184.5 xmax=49.5 ymin=-77.6 ymax=57.6 \
     zmin=-119.1 zmax=94.1 phi=-84.27 theta=90.35 \
      psi=-62.21 \
     auxmin=1 auxmax=53.6 \
     auxvisible=true auxlabel=n \
     legend=true \
     layer=Mark \
        in=zw.vot \
        x=ubin y=vbin z=wbin weight=n \
        shading=weighted size=2 color=blue

    – and presto, sh zw.sh would produce the plot I just had in TOPCAT. This makes a difference because now I can animate this.

    In his documentation, Mark already has a few hints on how to build animations; here are a few more ideas on how to organise this. For instance, if, as I want here, you want to animate more than one variable, stilts tloop may become a bit unwieldy. Here's how to give the camera angles in python:

    import sys
    from astropy import table
    import numpy
    angles = numpy.array(
      [float(a) for a in range(0, 360)])
      names=("psi", "theta")).write(
        sys.stdout, format="votable")

    – the only thing to watch out for is that the names match the names of the arguments in stilts that you want to animate (and yes, the creation of angles will make numpy afficionados shudder – but I wasn't sure if I might want to have somewhat more complex logic there).

    [Update (2021-04-01): Mark Taylor points out that all that Python could simply be replaced with a straightforward piece of stilts using the new loop table scheme in stilts, where you would simply put:

    acmd='addcol phi $1'
    acmd='addcol theta 40+30*cosDeg($1+57)'

    into the plot2cube command line – and you wouldn't even need the shell pipeline.]

    What's left to do is basically the shell script that TOPCAT wrote for me above. In the script below I'm using a little convenience hack to let me quickly switch between screen output and file output: I'm defining a shell variable OUTPUT, and when I un-comment the second OUTPUT, stilts renders to the screen. The other changes versus what TOPCAT gave me are de-dented (and I've deleted the theta and psi parameters from the command line, as I'm now filling them from the little python script):

    OUTPUT="omode=out out=pre-movie.png"
    python3 camera.py |\
    stilts plot2cube \
       xpix=500 ypix=500 \
       xlabel='ubin / km/s' ylabel='vbin / km/s' \
       zlabel='wbin / km/s' \
       xmin=-184.5 xmax=49.5 ymin=-77.6 ymax=57.6 \
       zmin=-119.1 zmax=94.1 \
       auxmin=1 auxmax=53.6 \
    phi=8 \
    animate=- \
    afmt=votable \
    $OUTPUT \
       layer=Mark \
          in=zw.vot \
          x=ubin y=vbin z=wbin weight=n \
          shading=weighted size=4 color=blue
    # render to movie with something like
    # ffmpeg -i "pre-movie-%03d.png" -framerate 15 -pix_fmt yuv420p /stream-movie.webm
    # (the yuv420p incantation is so real-world
    # web browsers properly will not go psychedelic
    # with the colours)

    The comment at the end says how to make a proper movie out of the PNGs this produces, using ffmpeg (packaged with every self-respecting distribution these days) and yielding a webm. Yes, going for mpeg x264 might be a lot faster for you as it's a lot more likely to have hardware support, but everything around mpeg is so patent-infested that for the sake of your first-born's soul you probably should steer clear of it.

    Movies are fun in webm, too.

« Page 2 / 14 »