• ADASS and Interop

    ADASS group photo

    ADASS XXIX is a big conference with lots of attendants. I've taken the liberty of scaling the photo so you really won't recognise me (though I am on the photo). Note that, regrettably, the interop will be a lot smaller.

    The people that create the Virtual Observatory standards, organised in the IVOA, meet twice a year: Once in spring for a five-day meeting (this year it happened in Paris), and once in autumn for a three-day meeting back-to-back to ADASS, the venerable (this year it's the 29th installment) meeting of people dealing with astronomy and computers.

    We're now on day three of ADASS, and for me, so far this has been more or an endless hackathon, with discussing and hacking on things like mirrors for DFBS, ADQL 2.1, the evolution of IVOA vocabularies (more on this soon somewhere around here), a vocabulary of object types, getting LAMOST 5 published properly in the VO, the measurements data model, convincing more registries to push out space-time coverage for their resources (I'm showing a poster on that), and a lot more.

    So, getting to actually listen to talks during ADASS almost is something of a luxury, and a mind-widening at that – I've just listend to a talk about effectively doubling the precision of VLBI geodesy (in this case, measuring the location of radio telescopes to a few millimeters) by a piece of clever software, and before that I could learn a bit about how complex it is to figure out how much interference something emitting radio waves will cause in some other place on earth (like, well, a radio telescope). In case you're curious: A bit more than a year from now, short papers on the topics will appear in the proceedings of ADASS XXIX, which in turn you'll find in the ADASS proceedings collections (or on arXiv before that).

    Given the experience of the last few days, I doubt I'll do anything like the live blog from Paris linked above. I still can't resist mentioning that at ADASS, I'm having a poster that's little more than an ad blitz for STC in the registry.

    Update (2019-10-13): Well, one week later I'm sitting in the closing session of the Interop, and I've even already given my summary of Semantics activities during the interop. Other topics I've talked about at this interop include interoperable authentication (I'm really interested in this because I'd like to enable persistent TAP uploads, where your uploaded tables are still there for you when you come back), a minor update to SimpleDALRegExt (which is overall rather technical and you probably don't want to look at), on the takeup of new Registry tech (which might come over as somewhat sad, but considering that you have to pull along many people to have changes in “the” Registry, it's not so bad at all), and on, as Mark Taylor called it, operational identification of server software (which I consider entertaining in its somewhat erratic narrative).

    And now, after 7 days of essential nonstop discussion and brainstorming, I'm longing to slump into a chair on the train back to Heidelberg and just enjoy the landscape rolling by.

  • GAVO at AG-Tagung Stuttgart

    towel with astro photo

    Our puzzler prize this year: a Photo of the seahorse in the LMC, taken during Hubble's 100000th orbit around the earth, on a fluffy towel.

    It's time again for the meeting of the Astronomische Gesellschaft (as 2017 in Göttingen; last year we had the IAU general assembly instead). We're there with a booth (right next to the exhibition on 100 years of IAU) and a splinter meeting, at which I'll have a sales pitch for cross-server uploads.

    And, of course, there's a puzzler again: you could win a beautiful towel if you solve a little VO-related problem. This year's puzzler is about where in the sky you'll see “nebulae” (in the classic sense defined by NGC) batched together most closely. If you've been following this blog for a while, it shouldn't be too hard, but to participate you'd have to find someone in Stuttgart to hand in your solution.

    If you are in Stuttgart: As usual, we'll be giving hints during the coffee breaks on Tuesday and Wednesday. So, be sure to visit our booth.

  • DaCHS is Bustered

    DaCHS is developed on Debian, and Debian is the recommended deployment platform. Hence, a new major release of Debian (where major means for them: We may break stuff) is always a big thing for me. And so it was with the release that came in July, codenamed “buster”. Both on the “big thing” and on the “break” counts. This posting gives DaCHS deployers some background for their buster upgrades. Astronomers not running Debian themselves won't risk missing anything if they skip this post.

    So, after I upgraded the first thing I noticed is that DaCHS would no longer even start because astropy (which it needs, in particular, because that's where pyfits sits these days) was gone. Simple explanation: Upstream astropy doesn't support python2 any more, and so Debian buster only has python3-astropy.

    Moving DaCHS to python3, unfortunately, isn't that easy; a major dependency, nevow (essentially, a web framework), isn't ported yet, and porting it is a major thing. Believe me, I've tried. The nasty thing, in particular, is that twisted, which lies below nevow still, hands up lots of byte strings. And in python3, b"a"!="a". You wouldn't believe how many interesting bugs that simple truth introduces when you got a library that handed out “just strings” in python2 and now byte strings in python3. Yikes.

    Update (2019-08-28): After quite a bit of experimentation, I finally gave up on providing a python2 version of astropy through release, because for a complicated set of reasons (including numpy declaring a conflict with existing astropys in buster) it is impossible to provide a package that works in buster and doesn't break stretch. So, for buster only you'll have to have a second (or, if running beta, third) gavo line in your sources.list (or equivalent):

    deb http://vo.ari.uni-heidelberg.de/debian buster-foreports main
    

    The instructions at our APT repository have been updated, so you won't have to bookmark this particular page.

    But that wasn't the end of it. Buster comes with Postgres 11, which I look forward to in particular because it supports parallel query execution. That could help us quite a bit, given out large catalogs that quite often we want to run sequential scans on. But of course this means upgrading postgres. And attempting to do that on my development machine immediately hit a wall. What's nice is that the q3c and pgsphere extensions that we've had to push out ourselves so far are now part of Debian main. What's rather fatal is that our pgsphere extensions dealing with HEALPixes and MOCs aren't part of the buster pgsphere package (the reasons for that are tedious and arcane and have to do with OpenSSL and the GPL).

    Also, the pgsphere package coming with buster is called postgres-pgsphere, which is rather unfortunate as it's missing the version indication. So: If you find it on your system, remove it right away. It will conflict with the one true pgsphere package (postgresql-11-pgsphere). That one you'll get from us, and it has the HEALPix stuff built in. TL;DR: run apt install postgresql-q3c postgresql-11-pgsphere before following the postgres update recipe linked above.

    There's a bit more to upgrading the database this time. Because of fairly low-level cleanup in Postgres itself. you're risking index corruption on string indices. Realistically, for almost anything you'll have, it's unlikely that you're affected (it's essentially about non-ASCII in strings), but then it's better to be safe than sorry, and hence you should say:

    reindex database gavo
    

    first thing after you've upgraded to Postgres 11 (which you should really do once the box is on buster). Only if you have very large tables it might be worth it to restrict the index regeneration to indices that could actually need it; see the postgres link above for how to do that.

    One last thing on Postgres upgrades: I've not quite tried to work out why, but probably depending on your /etc/hosts DaCHS on buster is much more likely to connect to your database using IPv6 than it was before. Many older Postgres configurations won't let you in then. If that happens to you, just edit /etc/postgresql/11/main/pg_hba.conf and add a line:

    host    all         all         ::1/32          md5
    

    (or something less permissive if you prefer).

    The next buster-related shock was when TOPCAT's TAP uploads stopped working while my regression tests didn't find anything wrong. After a bit of cursing I eventually figured out that that's not actually buster's fault but twisted's, which in a commit from May 2018 broke chunked uploads (essentially, that's when you're not saying up front how large your upload will be). I've filed a bug report on twisted, but we can't really wait until any sort of fix will be ready and have a broken TOPCAT-DaCHS relationship until then, so for now we're also shipping a fixed twisted package. If you're running DaCHS without our repository enabled, you will have to patch your the twisted code itself. The bug report tells what to do (no warranties, though, because I'm not entriely sure why they changed it in the first place; it's a very small change, though).

    [Update (2019-08-14) scratch the part with the fixed twisted packages. They're too much trouble on stretch systems. You can keep using them on buster boxes if you want, though. The most recent stable release monkeypatches the problem out of presumably broken twisteds, and so will the next beta.]

    I hope you're not totally discouraged now, because upgrade you should (though perhaps not right before going on vacation) – distribution upgrades are unavoidable if you want to run services for decades, and that's definitely a goal within the VO. See the Debian release note for Debian's take on dist upgrades, which arguably is a bit more alarmist than it would need to; a lean, server-only system typically is really simple to upgrade.

    Given the relatively large number of Debian packages we override in buster, I'll be particularly grateful if you complain early about breakage you observe (ideally use the dachs-support mailing list, but see Support for alternatives), and as usual you are encouraged to try the upgrade first on a development system if you have one. Which you should.

  • From Byurakan to L2: Short Spectra

    A snapshot from the DFBS tutorial: Carbon Stars in different spectral bands.

    A snapshot from the DFBS tutorial: Carbon Stars in different spectral bands.

    On June 30, a small project we've done together with the Armenian Virtual Observatory has ended. Its objective was to publish the spectra from the First Byurakan Survey (the DFBS) in a VO-compilant way. The data comes from one of the big surveys with Schmidt telescopes that form a sizable part of the observational heritage from the second part of the 20th century (you're still using a few of them daily if you tell Aladin to show a DSS plane).

    In this case, spectra from objects on the entire northern sky off the milky way down to about 18th mag were obtained. In a previous cooperation between Armenian and Italian astronomers a good decade ago, the plates were digitised and calibrated, and spectra were extracted. However, they resided behind a web interface so far, which made them somewhat clumsy to work with.

    Now, they're in the VO, and to give you a few ideas for what kind of things you can do with this kind of data, within the project we've also written the tutorial “Outlier Analysis in Low-Resolution Spectra”.

    Have a glance at the tutorial – you see, while the Byurakan survey certainly is a valuable resource by itself, I happen to believe at this point it's particularly valuable because with the next Gaia data release (planned for next year), a deluxe version of it will come: Gaia's RP/BP spectra will be all-sky, properly calibrated, and quite a bit deeper, but still low-resolution. So, if you're just waiting for such a data collection, you can train your methods right now on the DFBS.

  • ADQL Traps #1: NULL

    NULL is a difficult concept. Not only in SQL

    NULL is a difficult concept. Not only in SQL

    I recently got embarrassed by ADQL NULLs, i.e., the magic value indicating that a value in a given column is missing. And since that's a common source of errors when writing ADQL queries, I'll take this as a cue for a blog post.

    The concrete background is fairly technical and registry-ish; suffice it to say that some data providers who implemented interfaces conforming to some standard didn't properly say so in their registry records. Back in RegTAP 1.0 (that's the standard that says how a client like TOPCAT talks to the VO Registry), I decided to work around that by fudging the pattern for how to discover those interfaces so they'd still be found.

    In RegTAP 1.1, which is now under review by the VO community, I wanted to do away with that workaround. But would that break anything? This question translates to “are there vs:ParamHTTP interfaces that don't have a role attribute of std”. Whatever “ParamHTTP” and “role attribute” actually mean, just appreciate that it looks like it might translate into SQL like:

    select * from rr.interface
    where
      intf_type='vr:paramhttp'
      and not intf_role='std'
    

    I ran that query, rejoiced because it didn't return anything, removed the workarund from the standard, and then was shot down when I read Mark's mail (politely) saying I'm wrong and there are services still requiring the workaround. As usual: If a query returns what you expect, be double careful.

    What went wrong? Well, NULL semantics. You see, in SQL NULL is never equal to anything, not even itself (it's like NaN in IEEE floats in that: try n = float('nan');print(n==n) in Python and look again if you're cool about it). It's also not unequal. Don't take my word for it. Try:

    select * from tap_schema.schemas where NULL=NULL
    

    and:

    select * from tap_schema.schemas where NULL!=NULL
    

    – you'll get empty results in both cases.

    What does that mean for science queries? Well, whenever there's NULLs in columns (and the only safe assumption for now is that they may hide in there; we should probably add nun-null as a column property in the tap schema and in VODataService some day), you need to be careful in particular with inverted logic.

    Here's an example: Suppose you want to investigate NGC objects brighter than 10 mag in B in one bin in everything else in another. The ones brighter are simple:

    select count(*) from openngc.data where mag_b<10
    

    (try it on the TAP server at http://dc.g-vo.org/tap, it's 383 in the current release). It becomes difficult for “the rest”. If you write:

    select count(*) from openngc.data where not mag_b<10
    

    or, equivalently:

    select count(*) from openngc.data where mag_b>=10
    

    you'll get (for the current release) 10887. However, the whole catalogue has 13954 entries, so there's 13954-10887-383=2684 rows missing. Your “rest” has missed everything for which mag_b isn't given. Sure enough:

    select count(*) from openngc.data where mag_b is null
    

    (and this is the only good way to compare against null) gives 2684.

    The right way to say “anything for which mag_b is not smaller than 10” thus is:

    select count(*) from openngc.data
    where
      not mag_b<10
      or mag_b is null
    

    Morale: Unless you're sure there are no missing values (i.e., NULLs) in a column you're looking at, think about what these mean to your research (or other) question: Should these rows just vanish? Then you usually don't need to do anything and the SQL semantics magically do the right thing (which is why things are defined as they are). If, however, the corresponding rows would mean something to your question, you need to be explicit, and you must have some condition involving IS NULL or IS NOT NULL.

    The trouble, of course, is that just knowing this still isn't enough. You need to remember it in the right moment. Or you'll share my fate of suffering some public embarrassement.

« Page 13 / 21 »