Posts with the Tag DaCHS:

  • Spectral Units in ADQL

    math formulae.

    In case you find the piece of Python given below too hard to read: It's just this table of conversion expressions between the different SI units we are dealing with here.

    Astronomers these days work all along the electromagnetic spectrum (and beyond, of course). Depending on where they observe, they will have very different instrumentation, and hence some see their messengers very naturally as waves, others quite as naturally as particles, others just as electrons flowing out of a CCD that is sitting behind a filter.

    In consequence, when people say where in the spectrum they are, they use very different notions. A radio astronomer will say “I'm observing at 21 cm” or “at 50 GHz“. There's an entire field named after a wavelength, “submillimeter“, and blueward of that people give their bands in micrometers. Optical astronomers can't be cured of their Ångström habit. Going still more high-energy, after an island of nanometers in the UV you end up in the realm of keV in X-ray, and then MeV, GeV, TeV and even EeV.

    However, there is just one VO (or at least that's where we want to go). Historically, the VO has had a slant towards optical astronomy, which gives us the legacy of having wavelengths in far too many places, including Obscore. Retrospectively, this was an unfortunate choice not only because it makes us look optical bigots, but in particular because in contrast to energy and, by ν = E/h, frequency, messenger wavelength depends on the medium you work in, and I shudder to think how many wavelengths in my data center actually are air wavelengths rather than vacuum wavelengths. Also, as you go beyond photons, energy really is the only thing that reasonably characterises all messengers alike (well, even that still isn't quite settled for gravitational waves as long as we're not done with a quantum theory of gravitation).

    Well – the wavelength milk is spilled. Still, the VO has been boldly expanding its reach beyond the optical and infrared windows (recently, with neutrinos and gravitational waves, not to mention EPN-TAP's in-situ measurements in the solar system, even beyond the electromagnetic spectrum). Which means we will have to accomodate the various customs regarding spectral units described above. Where there are “thick” user interfaces, these can care about that. For instance, my datalink XSLT and javascript lets people constrain spectral cutouts (along BAND) in a variety of units (Example).

    But what if the UI is as shallow as it is in ADQL, where you deal with whatever is in the underlying database tables? This has come up again at last week's EuroVO Technology Forum in virtual Strasbourg in the context of making Obscore more attractive to radio astronomers. And thus I've sat down and taught DaCHS a new user defined function to address just that.

    Up front: When you read this in 2022 or beyond and everything has panned out, the function might be called ivo_specconv already, and perhaps the arguments have changed slightly. I hope I'll remember to update this post accordingly. If not, please poke me to do so.

    The function I'm proposing is, mainly, gavo_specconv(expr, target_unit). All it does is convert the SQL expression expr to the (spectral) target_unit if it knows how to do that (i.e., if the expression's unit and the target unit are spectral units properly written in VOUnit) and raise an error otherwise.

    So, you can now post:

    SELECT TOP 5 gavo_specconv(em_min, 'GHz') AS nu
    FROM ivoa.obscore
    WHERE gavo_specconv((em_min+em_max)/2, 'GHz')
        BETWEEN 1 AND 2
      AND obs_collection='VLBA LH sources'
    

    to the TAP service at http://dc.g-vo.org/tap. You will get your result in GHz, and you write your constraint in GHz, too. Oh, and see below on the ugly constraint on obs_collection.

    Similarly, an X-ray astronomer would say, perhaps:

    SELECT TOP 5 access_url, gavo_specconv(em_min, 'keV') AS energy
    FROM ivoa.obscore
    WHERE gavo_specconv((em_min+em_max)/2, 'keV')
      BETWEEN 0.5 AND 2
      AND obs_collection='RASS'
    

    This works because the ADQL translator can figure out the unit of its first argument. But, perhaps regrettably, ADQL has no notion of literals with units, and so there is no way to meaningfully say the equivalent of gavo_specconv(656, 'Hz') to get Hα in Hz, and you will receive a (hopefully helpful) error message if you try that.

    However, this functionality is highly desirable not the least because the queries above are fairly inefficient. That's why I added the funny constraints on the collection: without them, the queries will take perhaps half a minute and thus require async operation on my box.

    The (fundamental) reason for that is that postgres is not smart enough to work out it could be using an index on em_min and em_max if it sees something like nu between 3e8/em_min and 3e7/em_max by re-writing the constraint into 3e8/nu between em_min and em_max (and think really hard about whether this is equivalent in the presence of NULLs). To be sure, I will not teach that to my translation layer either. Not using indexes, however, is a recipe for slow queries when the obscore table you query has about 85 million rows (hi there in 2050: yes, that was a sizable table in our day).

    To let users fix what's too hard for postgres (or, for that matter, the translation engine when it cannot figure out units), there is a second form of gavo_specconv that takes a third argument: gavo_specconv(expr, unit_of_expr, target_unit). With that, you can write queries like:

    SELECT TOP 5 gavo_specconv(em_min, 'Angstrom') AS nu
    FROM ivoa.obscore
    WHERE gavo_specconv(5000, 'Angstrom', 'm')
      BETWEEN em_min AND em_max
    

    and hope the planner will use indexes. Full disclosure: Right now, I don't have indexes on the spectral limits of all tables contributing to my obscore table, so this particular query only looks fast because it's easy to find five datasets covering 500 nm – but that's an oversight I'll fix soon.

    Of course, to make this functionality useful in practice, it needs to be available on all obscore services (say) – only then can people run all-VO obscore searches without the optical bias. The next step (before Bambi-eyeing the TAP implementors) therefore would be to get it into the catalogue of ADQL user defined functions.

    For this, one would need to specify a bit more carefully what units must minimally be supported. In DaCHS, I have built this on a full implementation of VOUnits, which means you can query using attoparsecs of wavelength and get your result in dekaerg (which is a microjoule: 1 daerg = 1 uJ in VOUnits – don't you just love this?):

    SELECT gavo_specconv(
      (spectral_start+spectral_end)/2, 'daerg')
      AS energy
    FROM rr.stc_spectral
    WHERE gavo_specconv(0.0002, 'apc', 'J')
      BETWEEN spectral_start AND spectral_end
    

    (stop computing: an attoparsec is about 3 cm). This, incidentally, queries the draft RegTAP extension for the VODataService 1.2 coverage in space, time, and spectrum, which is another reason I'm proposing this function: I'm not quite sure how well my rationale that using Joules of energy is equally inconvenient for all communities will be generally received. The real rationale – that Joule is the SI unit for energy – I don't dare bring forward in the first place.

    Playing with wavelengths in AU (you can do that, too; note, though, that VOUnit forbids prefixes on AU, so don't even try mAU) is perhaps entertaining in a slightly twisted way, but admittedly poses a bit of a challenge in implementation when one does not have full VOUnits available. I'm currently thinking that m, nm, Angstrom, MHz, GHz, keV and MeV (ach! No Joule! But no erg, either!) plus whatever spectral units are in use in the local tables would about cover our use cases. But I'd be curious what other people think.

    Since I found the implementation of this a bit more challenging than I had at first expected, let me say a few words on how the underlying code works; I guess you can stop reading here unless you are planning to implement something like this.

    The fundamental trouble is that spectral conversions are non-linear. That means that what I do for ADQL's IN_UNIT – just compute a conversion factor and then multiply that to whatever expression is in its first argument – will not work. Instead, one has to write a new expression. And building these expressions becomes involved because there are thousands of possible combinations of input and output units.

    What I ended up doing is adopting standard (i.e., SI) units for energy (J), wavelength (m), and frequency (Hz) as common bases, and then first convert the source and target units to the applicable standard unit. This entails trying to convert each input unit to each standard unit until a conversion actually works, which in DaCHS' Python looks like this:

    def toStdUnit(fromUnit):
        for stdUnit in ["J", "Hz", "m"]:
            try:
                 factor = base.computeConversionFactor(
                     fromUnit, stdUnit)
            except base.IncompatibleUnits:
                continue
            return stdUnit, factor
    
        raise common.UfuncError(
            f"specconv: {fromUnit} is not a spectral unit understood here")
    

    The VOUnits code is hidden away in base.computeConversionFactor, which raises an IncompatibleUnits when a conversion is impossible; hence, in the end, as a by-product this function also determines what kind of spectral value (energy, frequency, or wavelength) I am dealing with.

    That accomplished, all I need to do is look up the conversions between the basic units, which can be done in a single dictionary mapping pairs of standard units to the conversion expression templates. I have not tried to make these templates particularly pretty, but if you squint, you can still, I hope, figure out this is actually what the opening image shows:

    SPEC_CONVERSION = {
        ("J", "m"): "h*c/(({expr})*{f})",
        ("J", "Hz"): "({expr})*{f}/h",
        ("J", "J"): "({expr})*{f}",
        ("Hz", "m"): "c/({expr})/{f}",
        ("Hz", "Hz"): "{f}*({expr})",
        ("Hz", "J"): "h*{f}*({expr})",
        ("m", "m"): "{f}*({expr})",
        ("m", "Hz"): "c/({expr})/{f}",
        ("m", "J"): "h*c/({expr})/{f}",}
    

    expr is (conceptually) replaced by the first argument of the UDF, and f is the conversion factor between the input unit and the unit expr is in. Note that thankfully, not additive operators are involved and thus all this is numerically well-conditioned. Hence, I can afford not attempting to simplify any of the expressions involved.

    The rest is essentially book-keeping, where I'm using the ADQL parser to turn the expression into a tree fragment and then fiddling in the tree fragment for expr into that. The result then replaces the UDF function call in the syntax tree. You can review all this in context in DaCHS' ufunctions.py, starting at the definition of toStdUnit.

    Sure: this is no Turing award material. But perhaps these notes are useful when people want to put this kind of thing into their ADQL engines. Which I'd consider a Really Good Thing™.

  • DaCHS 2.3 on the way to Debian main

    DaCHS, Debian, and 2.3

    DaCHS 2.3 will be the first DaCHS officially in Debian.

    DaCHS releases usually come around the Interops in (roughly) May and November. Not this one, though, for one pleasant, one unpleasant, and several other reasons.

    The unpleasant reason first: The 2.2 release has a fairly severe memory leak in it (resulting, in roundabout ways, from python 3 preserving tracebacks of nested exceptions), which of course really became virulent on my server right over the holidays. If you run a site with just a few gigs of RAM that might be hit by second-rate async clients, this will bite you and you ought to upgrade now (well, you ought to upgrade anyway).

    The pleasant reason is that DaCHS has made it into Debian main and thus, unless something disastrous happens, it will be part of the Debian version 11 (“bullseye”). This means that people who do not need to be on the bleeding edge, will not need to monkey around with our repository (and its signing key) any more starting some time in 2021 (or just about now, if they're running testing). I can't tell you how gratifying that feels to me. And well, I wanted relatively recent code corresponding to a something on our release branch in bullseye.

    One of the other reasons is that stilts' author Mark Taylor is trying to stomp out TAP services failing his taplint's validation, and many DaCHS 2.2 services (those that don't define TAP examples, which of course is a shame anyway) fail with only the (really minor) error E-EXDH-1 (see below).

    DaCHS 2.3 has some other noteworthy changes; as usual in minor version steps, my expectation is that none of this will break existing services. Still, you may want to glance over the following list, as there are some behavioural changes nevertheless. In approximate order of the wizardry involved:

    • I've long had a bad consciousness because DaCHS has stored cleartext passwords so far. That's probably not a problem for DaCHS itself (as it does not protect great riches), but people tend to re-use passwords, and I'd have hated to leak passwords that might work elsewhere. Well, no longer: the dc.users table now contains hashed passwords, and the upgrade will hash them. This, in particular, means that you cannot recover them once you have updated (which, of course, is as it should be).

    • The javascript delivered with DaCHS was no longer quite up to date with Debian's jquery. I have updated it in several ways, and I have restored the functionality of the WebSAMP button in the default response. If you have custom HTML templates containing javascript, you may need to update them to newer jquery, too, specifically,

      • change .unload( to .on("unload", (this happens in the SAMP code in defaultresponse.html, for instance).
      • also in the SAMP code in overridden defaultresponses, change the icon URL to completeURL("/logo_tiny.png") (or whatever) to avoid trouble with https installations.
      • if you compare jquery element names: these are now returned in lower case.

      And yes, WebSAMP now mostly works with HTTPS (which is unrelated to this update, except that DaCHS until 2.2 suppresses the WebSAMP button when it thinks it is delivering through HTTPS).

    • DaCHS now honours upgrade-insecure-requests headers that common web browsers issue and will then redirect them to https when appropriate. So, please don't forcibly do these redirects any more from reverse proxies – they break, among other things, TAP, and they're generally just a bad idea.

    • DaCHS now instructs the database to return all bits of floating point numbers. This may break your regression tests, but it's the right thing to do (blog post on this).

    • Another thing that may break regression tests: TAP results now have column names in the case given in the RD (where previously they were lowercased unless quoted). Let me cite rule 1 of SQL table design: Don't use mixed-case column names.

    • Wildcards in the directory parts of sources patterns are now expanded, which means that you can write things like <sources pattern="data/202?/*.fits"/>, which previously wouldn't have done what you might reasonably expect; however, this might in rare cases match additional sources when you re-import data.

    • The examples endpoint now returns a 404 if no examples are defined on a service; this fixes the stilts taplint E-EXA-EXDH-1 error I mentioned above.

    • DaCHS will now refuse to use x-unregistred as an authority when publishing resources or creating publisher DIDs. This is to protect to people who do a lot of imports before settling on their authority; sometimes DaCHS' fallback null authority got into their databases, which then caused quite a bit of cleanup effort.

    • Because of licensing problems, the Debian package no longer contains the CC logos for the time being. If you want them back, drop appropriate files cc0.png, ccby.png, and ccybysa.png into /var/gavo/web/nv_static/img

    • You can now list modules you want in a procedure application in its setup/@imports attribute. I've done this after I had to add code to a proc's setup just to run an import once too often.

    • simbadinterface's Sesame now uses the dc.metastore table to cache results rather than files as before. Previous saveNew, id, and debug parameters are no longer supported (the base.caches.getSesame interface is unchanged, so it's unlikely you'd notice this).

    • table.query() or querier.query() are now seriously deprecated (you may have used them in code embedded in RDs). See Database Queries in the reference documentation for what the recommended query patterns are (and have been for a while). Just one word of warning: table.query would macro-expand its argument, which the connection method obviously cannot. If you depend on that, call table.expand(query) manually first.

    With this: Merry upgrading and a happy new year!

  • The Case of the Disappearing Bits

    [number line with location markers]

    Every green line in this image stands for a value exactly representable in a floating point value of finite size. As you see, it's a white area out there [source]

    While I was preparing the publication of Coryn Bailer-Jones' distance estimations based on Gaia eDR3 (to be released about tomorrow), Coryn noticed I was swallowing digits from his numbers. My usual reaction of “aw, these are meaningless anyway because your errors are at least an order of magnitude higher” didn't work this time, because Gaia is such an incredible machine that some of the values really have six significant decimal digits. For an astronomical distance! If I had a time machine, I'd go back to F.W. Bessel right away to make him pale in envy.

    I'm storing these distances as PostgreSQL REALs, so these six digits are perilously close the seven decimal digits that the 23 bits of mantissa of single precision IEEE 754 floats are usually translated to. Suddenly, being cavalier with the last few bits of the mantissa isn't just a venial sin. It will lose science.

    So, I went hunting for the bits, going from parsing (in this case C's sscanf) through my serialisation into Postgres binary copy material (DaCHS operators: this is using a booster) to pulling the material out of the database again. And there I hit it: the bits disappeared between copying them in and retrieving them from the database.

    Wow. Turns out: It's a feature. And one I should have been aware of in that Postgres' docs have a prominent warning box where it explains its floating point types: Without setting extra-float-digits it will cut off bits. And it's done this ever since the dawn of DaCHS (in postgres terms, version 8.2 or so).

    Sure enough (edited for brevity):

    gavo=$ select r_med_geo from gedr3dist.main
    gavo-$ where source_id=563018673253120;
        1430.9
    
    gavo=$ set extra_float_digits=3;
    gavo=$ select r_med_geo from gedr3dist.main
    gavo-$ where source_id=563018673253120;
     1430.90332
    

    Starting with its database schema 26 (which is the second part of the output of dachs --version), DaCHS will configure its database roles always have extra_float_digits 3; operators beware: this may break your regression tests after the next upgrade.

    If you want to configure your non-DaCHS role, too, all it takes is:

    alter role (you) set extra_float_digits=3,
    

    You could also make the entire database or even the entire cluster behave like that; but then losing these bits isn't always a bad idea: It really makes the floats prettier while most of the time not losing significant data. It's just when you want to preserve the floats as you get them – and with science data, that's mostly a good idea – that we just can't really afford that prettyness.

    Update (2021-04-22): It turns out that this was already wrong (for some meaning of wrong) when I wrote this. Since PostgreSQL 12, Postgres uses shortest-precise by default (and whenever extra_float_digits is positive). The official documentation has a nice summary of the problem and the way post-12 postgres addresses it. So: expect your float-literal-comparing regression tests to break after the upgrade to bullseye.

  • DaCHS 2.2 is out

    Image: DaCHS "entails" 2.2

    DaCHS 2.2 adds support for what simple semantics we currently do in the VO. Which is a welcome excuse to abuse one of the funny symbols semanticians love so much.

    Today, I've released DaCHS 2.2, the second stable version of DaCHS running on Python 3. Indeed, we have ironed out a few sore spots that have put that “stable” into question, especially if you didn't run things on Debian Buster. Mind you, playing it safe and just going for Debian is still recommended: Compared to the Python 2 world, where things largely didn't break for a decade, the Python 3 universe is still shaking out, and so the versions of dependencies do matter. It's actually fairly gruesome how badly pyparsing 2.4 will break DaCHS. But that's for another day.

    Despite this piece of fearmongering, it'd be great if you could upgrade your installations if you are running DaCHS, and it's pretty safe if you're on Debian buster anyway (and if you're running Debian in the first place, you should be running buster by now).

    Here are the more notable changes in this release:

    • DaCHS can now (relatively easily) write time series in the form of what Ada Nebot's Time Series Annotation note proposes. See the tutorial chapter on building time series for how to do that in practice. Seriously: If you have time series to publish, by all means try this out. The specification can still be fixed, and so this is the perfect time to find problems with the plan.
    • The 2.2 release contains support for the MOC ADQL functions mentioned in the last post on this blog. Of course, to make them work, you will still have to acquaint your database with the new functionality.
    • DaCHS has learned to use IVOA vocabularies as per the current draft for Vocabularies in the VO 2. The most visible effect for you probably is that DaCHS now warns if your subject keywords are not taken from the Unified Astronomy Thesaurus (UAT) – which they almost certainly are not, because the actual format of these keywords is a bit funky. On the other hand, if you employ the “plain” root page template (see the root template in our templating guide if you are not sure what I am talking about here), you will get nice, human-friendly labels for the computer-friendly terms you ought to put into subjects. In case you don't bother: Given I'm currently serving as chair of the semantics working group of the IVOA, the whole topic will certainly come up again soon, and at that point I will probably also talk about another semantics-related newcomer to DaCHS, the gavo_vocmatch ADQL UDF.
    • There is a new command dachs datapack for interacting with frictionless data packages. The idea is that you can say dachs datapack create myres/q myres.pack and obtain an archive of all that is necessary to re-create myres on another DaCHS installation, where you would say dachs datapack load myres.pack. Frankly, this isn't much different from just tarring up the resource directory at this point, except that any cruft that may have accumulated in the directory is skipped and there is a bit of structured metadata. But then interoperability always starts slowly. Note, by the way, that this certainly does not teach DaCHS to do anything sensible with third-party data packages; while I've not thought hard about this, as it seems a remote use case, I am pretty sure that even the “tabular data packages” that refine the rough general metadata quite a bit simply have nowhere near enough metadata to create a useful VO resource or TAP table.
    • As part of my never-ending struggle against bitrot (in case you've always wondered what “curation” means: that, essentially), I'm running dachs val -vc ALL in my own data center once every month. This used to traverse the file system to locate all RDs defined on a box and then make sure they are still ok and their definitions match the database schema. That behaviour has now changed a bit: It will only check published RDs now. I cannot lie: the main reason for the change is because on my production machine the file system traversal has taken longer and longer as data accumulated. But then beyond that there is much less to worry when unpublished gets a little bit mouldy. To get back the old behaviour of validating all RDs that are reachable by the server, use ALL_RECURSE instead of ALL.
    • DaCHS has traditionally assumed that you are running multiple services on one site, which is why its root page is rendered over a service that exposes metadata on local resources. If that doesn't quite work for how you use DaCHS – perhaps because you want to have your own custom renderers and data functions on your root page, perhaps because you only have one browser-based service and that should be the root page right away –, you can now override what is shown when people access the root URI of your DaCHS installation by setting the [web]root config item to the path of the resource you want as root (e.g., myres/q/s/fixed when the root page should be made by the fixed renderer on the service s within the RD myres/q).
    • Scripting in DaCHS is a powerful way to execute python or SQL code when certain things happen. That seems an odd thing to want until you need it; then you need it badly. Since DaCHS 2.2, scripts executed before or after the creation of a table, before its deletion, or after its meta data has been updated, can sit on tables (where they have always belonged). Before, they could only be on makes (where they can still sit, but of course they are then only executed if the table is operated through that particular make) and RDs (from where they could be copied). That latter location is now forbidden in order to free up RD scripts for later sanitation. Use STREAM and FEED instead if you really used something like that (and I'd bet you don't).
    • Minor behavioural changes:
      1. Due to a bug, you could write things like <schema foo="bar">my_schema<schema>, i.e., have attributes on attributes written in element form. That is now flagged as an error. Since that attribute was fed to the embedding element, you might need to add it there.
      2. If you have custom flot plots in one of your templates (and you don't if you don't know what I'm talking about), you now have to set style to Points or Lines where you had usingIndex 0 or 1 before.
      3. The sidebar template no longer has links to a privacy policy (that few bothered to fill out). See extra sidebar items in the tutorial on how to get them back or add something else.

    The most important change comes last: The default logo DaCHS shows unless you override it is no longer the GAVO logo. That's, really, been inappropriate from the start. It's now the DaCHS logo, the thing that's in this posts's article image. Which isn't quite as tasteful as the GAVO one, true. But I trust we'll all get used to it.

  • Crazy Shapes in TAP

    OpenNGC shapes

    A complex shape from OpenNGC: MOCs need not be convex, or simply connected, or anything.

    So far when you did spherical geometry in ADQL, you had points, circles, and polygons as data types, and you could test for intersection and containment as operations. This feature set is a bit unsatisfying because there are no (algebraic) groups in this picture: When you join or intersect two circles, the result only is a circle if one contains the other. With non-intersecting polygons, you will again not have a (simply connected) spherical polygon in the end.

    Enter MOCs (which I've mentioned a few times before on this blog): these are essentially arbitrary shapes on the sky, in practice represented through lists of pixels, cleverly done so they can be sufficiently precise and rather compact at the same time. While MOCs are powerful and surprisingly simple in practice, ADQL doesn't know about them so far, which limits quite a bit what you can do with them. Well, DaCHS would serve them since about 1.3 if you managed to push them into the database, but there were no operations you could do on them.

    Thanks to work done by credativ (who were really nice to work with), funded with some money we had left from our previous e-inf-astro project (BMBF FKZ 05A17VH2) on the pgsphere database extension, this has now changed. At least on the GAVO data center, MOCs are now essentially first-class citizens that you can create, join, and intersect within ADQL, and you can retrieve the results. All operators of DaCHS services are just a few updates away from being able to offer the same.

    So, what can you do? To follow what's below, get a sufficiently new TOPCAT (4.7 will do) and open its TAP client on http://dc.g-vo.org/tap (a.k.a. GAVO DC TAP).

    Basic MOC Operations in TAP

    First, let's make sure you can plot MOCs; run

    SELECT name, deepest_shape
    FROM openngc.shapes
    

    Then do Graphics/Sky Plot, and in the window that pops up then, Layers/Add Area Control. Then select your new table in the Position tab, and finally choose deepest_shape as area (yeah, this could become a bit more automatic and probably will over time). You will then see the footprints of a few NGC objects (OpenNGC's author Mattia Verga hasn't done all yet; he certainly welcomes help on OpenNGC's version control repo), and you can move around in the plot, yielding perhaps something like Fig. 1.

    Now let's color these shapes by object class. If you look, openngc.data has an obj_type column – let's group on it:

    SELECT
      obj_type,
      shape,
      AREA(shape) AS ar
    FROM (
      SELECT obj_type, SUM(deepest_shape) AS shape
      FROM openngc.shapes
      NATURAL JOIN openngc.data
      GROUP BY obj_type) AS q
    

    (the extra subquery is a workaround necessary because the area function wants a geometry or a column reference, and ADQL doesn't allow aggregate functions – like sum – as either of these).

    In the result you will see that so far, contours for about 40 square degrees of star clusters with nebulae have been put in, but only 0.003 square degrees of stellar associations. And you can now plot by the areas covered by the various sorts of objects; in Fig. 2, I've used Subsets/Classify by Column in TOPCAT's Row Subsets to have colours indicate the different object types – a great workaround when one deals with categorial variables in TOPCAT.

    MOCs and JOINs

    Another table that already has MOCs in them is rr.stc_spatial, which has the coverage of VO resources (and is the deeper reason I've been pushing improved MOC support in pgsphere – background); this isn't available for all resources yet , but at least there are about 16000 in already. For instance, here's how to get the coverage of resources talking about planetary nebulae:

    SELECT ivoid, res_title, coverage
    FROM rr.subject_uat
      NATURAL JOIN rr.stc_spatial
      NATURAL JOIN rr.resource
    WHERE uat_concept='planetary-nebulae'
      AND AREA(coverage)<20
    

    (the rr.subject_uat table is a local extension to RegTAP that will be the subject of some future blog post; you could also use rr.res_subject, but because people still use wildly different keyword schemes – if any –, that wouldn't be as much fun). When plotted, that's the left side of Fig. 3. If you do that yourself, you will notice that the resolution here is about one degree, which is a special property of the sort of MOCs I am proposing for the Registry: They are of order 6. Resolution in MOC goes up with order, doubling with every step. Thus MOCs of order 7 have a resolution of about half a degree, MOCs of order 5 a resolution of about two degrees.

    One possible next step is fetch the intersection of each of these coverages with, say, the DFBS (cf. the post on Byurakan spectra). That would look like this:

    SELECT
      ivoid,
      res_title,
      gavo_mocintersect(coverage, dfbscoverage) as ovrlp
    FROM (
      SELECT ivoid, res_title, coverage
      FROM rr.subject_uat
      NATURAL JOIN rr.stc_spatial
      NATURAL JOIN rr.resource
      WHERE uat_concept='planetary-nebulae'
      AND AREA(coverage)<20) AS others
    CROSS JOIN (
      SELECT coverage AS dfbscoverage
      FROM rr.stc_spatial
      WHERE ivoid='ivo://org.gavo.dc/dfbsspec/q/spectra') AS dfbs
    

    (the DFBS' identifier I got with a quick query on WIRR). This uses the gavo_mocintersect user defined function (UDF), which takes two MOCs and returns a MOC of their common pixels. Which is another important part why MOCs are so cool: together with union and intersection, they form groups. It should not come as a surprise that there is also a gavo_mocunion UDF. The sum aggregate function we've used in our grouping above is (conceptually) built on that.

    Planetary Nebula footprint and plate matches

    Fig. 3: Left: The common footprint of VO resources declaring a subject of planetary-nebula (and declaring a footprint). Right bottom: Heidelberg plates intersecting this, and, in blue, level-6 intersections. Above this, an enlarged detail from this plot.

    You can also convert polygons and circles to MOCs using the (still DaCHS-only) MOC constructor. For instance, you could compute the coverage of all resources dealing with planetary nebulae, filtering against obviously over-eager ones by limiting the total area, and then match that against the coverages of images in, say, the Königstuhl plate achives HDAP. Watch this:

    SELECT
      im.*,
      gavo_mocintersect(MOC(6, im.coverage), pn_coverage) as ovrlp
    FROM (
      SELECT SUM(coverage) AS pn_coverage
      FROM rr.subject_uat
      NATURAL JOIN rr.stc_spatial
      WHERE uat_concept='planetary-nebulae'
      AND AREA(coverage)<20) AS c
    JOIN lsw.plates AS im
    ON 1=INTERSECTS(pn_coverage, MOC(6, coverage))
    

    – so, the MOC(order, geo) function should give you a MOC for other geometries. There are limits to this right now because of limitations of the underlying MOC library; in particular, non-convex polygons are not supported right now, and there are precision issue. We hope this will be rectified soon-ish when we base pgsphere's MOC operations on the CDS HEALPix library. Anyway, the result of this is plotted on the right of Fig. 3.

    Open Ends

    In case you have MOCs from the outside, you can also construct MOCs from literals, which happen to be the ASCII MOCs from the standard. This could look like this:

    SELECT TOP 1
      MOC('4/30-33 38 52 7/324-934') AS ar
    FROM tap_schema.tables
    

    For now, you cannot combine MOCs in CONTAINS and INTERSECTS expressions directly; this is mainly because in such an operation, the machine as to decide on the order of the MOC the other geometries are converted to (and computing the predicates between geometry and MOC directly is really painful). This means that if you have a local table with MOCs in a column cmoc that you want to compare against a polygon-valued column coverage in a remote table like this:

    SELECT db.* FROM
      lsw.plates AS db
      JOIN tap_upload.t6
    ON 1=CONTAINS(coverage, cmoc) -- fails!
    

    you will receive a rather scary message of the type “operator does not exist: spoly <@ smoc”. To fix it (until we've worked out how to reasonably let the computer do that), explicitly convert the polygon:

    SELECT db.* FROM
      lsw.plates AS db
      JOIN tap_upload.t6
    ON 1=CONTAINS(MOC(7, coverage), cmoc)
    

    (be stingy when choosing the order here – MOCs that already exist are fast, but making them at high order is expensive).

    Having said all that: what I've written here is bleeding-edge, and it is not standardised yet. I'd wager, though, that we will see MOCs in ADQL relatively soon, and that what we will see will not be too far from this experiment. Well: Some rough edges, I'd hope, will still be smoothed out.

    Getting This on Your Own DaCHS Installation

    If you are running a DaCHS installation, you can contribute to takeup (and if not, you can stop reading here). To do that, you need to upgrade to DaCHS's latest beta (anything newer than 2.1.4 will do) to have the ADQL extension, and, even more importantly, you need to install the postgresql-postgres package from our release repository (that's version 1.1.4 or newer; in a few weeks, getting it from Debian testing would work as well).

    You will probably not get that automatically, because if you followed our normal installation instructions, you will have a package called postgresql-11-pgsphere installed (apologies for this chaos; as ususal, every single step made sense). The upshot is that with our release repo added, sudo apt install postgresql-pgsphere should give you the new code.

    That's not quite enough, though, because you also need to acquaint the database with the new functions. This can only be done with database administrator privileges, which DaCHS by design does not possess. What DaCHS can do is figure out the commands to do that when it is called as dachs upgrade -e. Have a look at the output, and if you are satisfied it is about what to expect, just pipe it into psql as a superuser; in the default installation, dachsroot would be sufficiently privileged. That is:

    dachs upgrade -e | psql gavo   # as dachsroot
    

    If running:

    select top 1 gavo_mocunion(moc('1/3'), moc('2/9'))
    from tap_schema.tables
    

    through your TAP endpoint returns '1/3 2/9', then all is fine. For entertainment, you might also make sure that gavo_mocintersect(moc('1/3'), moc('2/13')) is 2/13 as expected, and that if you intersect with 2/3 you get back an empty string.

    So – let's bring MOCs to ADQL!

« Page 2 / 5 »