• Updates to GAVO's Tutorials

    Over the years, GAVO has produced a number of VO tutorials, i.e., texts that introduce some technique related to using the Virtual Observatory, preferably within some halfway plausible scenario. In effect, they are software documentation, and as software itself, software documentation suffers from bit rot. To work against that, the tutorials have to be revised occasionally.

    My two student assistants Sonja Gabriel and Chuanming Mao have recently done some of that revising. Let me use this opportunity to show off some of these freshly polished tutorials.

    A classic one (that has, if I may say so myself, aged rather well), is Adding catalog data to object lists using the VO. This is a thinly disguised introduction to TAP uploads, arguably the most powerful of all the VO tech to date. If you have come to this place without ever having done a TAP upload, you owe it to yourself to at least skim the tutorial and quickly follow along the few steps to do positional crossmatches with just about any astronomical catalog and with just about any level of sophistication.

    part of a screenshot: a histogram, a sky photo with overplotted points

    Another classic – it has its roots in the original Italian VO Days[1] – is TOPCAT and Aladin working together. It is using SDSS data of some galaxy cluster to try and get you to to send around data and positions between different programs using SAMP. If you are reading VO blogs, it is not unlikely this kind of thing will make you yawn. But at VO Days, it's little things like this that usually most immediately appeal to students and researchers alike.

    part of a screenshot: a color-magnitude diagram is a very narrow main sequence, and a proper motion plot

    From a tech point of view, Explore the Pleiades with TOPCAT and Aladin also mainly looks at SAMP (perhaps even somewhat less convincingly), but it's such a striking demo of what an amazing instrument Gaia is, and it's a nice introduction to TOPCAT's VO interface and subsetting facility that it's definitely worth a look, in particular as a showcase of having instant results with the VO.

    circular cloud of red crosses and blue circles in a celestial coordinate system

    An entirely different topic (well: it also employs SAMP for a moment) is covered by Data Discovery Using the Virtual Observatory Registry. This is trying to motivate looking for data collections in the VO Registry (in the form of our Browser interface to it). This tutorial has grown quite a bit during the review and now includes two sections joining data from different resources for various purposes. One section illustrates how systematics of quasar redshifts might be looked into using different sources, the other investigates the Tully-Fisher relationship in different spectral bands.

    A TOPCAT-plotted histogram with a sharp peak around 39.5 AU and a much wider one around 44.

    The tutorial on Asteroids in the Solar System was entriely overhauled. It was (and still is) mainly intended to be used in schools, and thus it originally just built on things that ran in a web browser. As is typical of things in web browsers, they have long since vanished. Hence, a rather fundamental update was necessary anyway. While we were looking for interesting things to do – the plot above, by the way, is the distribution of major halfaxes in the Kuiper belt –, we ended up even includeding a brief bit on ADQL.

    Due to its school focus, we are also offering this particular text in German as well as in English. If you are an Astronomy teacher with particularly motivated pupils , we would like to hear from you…

    An aladin window showing two aligned photos of the ring nebula in Lyra

    The last revised tutorial I would like to mention also has a somewhat special (main) target audience: Astrometric Calibration using Aladin. Admittedly, automatic, or “blind” calibration has become really great, and I think getting their images located on the sky is not much of a problem even for amateurs any more, thanks in part to services like astrometry.net. But then – sometimes there is nothing like a good, old manual, ummm, “plate” solution. Aladin and the VO make that lot less tedious than it used to be.

    Of course, I cannot have a post on tutorials without mentioning the VO Text Treasures, a web page that shows the educational material currently registered in the VO Registry. This little page also accounts for bit rot: You can sort by the time last inspected there, and thanks to Sonja's and Chuanming's efforts, our tutorials look very good in that representation at the moment.

    In case you have some material suitable for WIRR yourself: Please register it, too. Send me a mail and I will lend you a hand (or, if you are a VO pro, directly read the pertinent standard).

    [1]That's block courses on VO matters lasting a day or two. If you are in Germany, you can book us for your very own one!
  • HEALPix Maps: In General and in Gaia

    blue and reddish pixels drawing a bar on the sky.

    A map of average Gaia colours in HEALPixes 2/83 and 2/86 (Orion south-east). This post tells you how to (relatively) quickly produce such maps.

    This year's puzzler for the AG Tagung turned out to be a valuable source of interesting ADQL queries. I have already written about finding dusty spots on the sky, and in the puzzler solution, I had promised some words on creating dust maps, or, more generally, HEALPix maps of any sort.

    Making HEALPix maps with Gaia source_ids

    The basic technique is explained in Mark Taylor's classical ADASS poster from 2016. On GAVO's TAP service (access URL http://dc.g-vo.org/tap), you will also find an example for that (in TOPCAT's TAP window, check the Service-provided section unter the Examples button for it). However, once you have Gaia source_ids, there is something a lot faster and arguably not much less convenient. Let me quote the footnote on source_id from my DR3 lite table:

    For the contents of Gaia DR3, the source ID consists of a 64-bit integer, least significant bit = 1 and most significant bit = 64, comprising:

    • a HEALPix index number (sky pixel) in bits 36 - 63; by definition the smallest HEALPix index number is zero.
    • […]

    This means that the HEALpix index level 12 of a given source is contained in the most significant bits. HEALpix index of 12 and lower levels can thus be retrieved as follows:

    • [...]
    • HEALpix [at] level n = (source_id)/(235⋅412 − n).

    That is: Once you have a Gaia source_id, you an compute HEALpix indexes on levels 12 or less by a simple integer division! I give you that the more-than-35-bit numbers you have to divide by do look a bit scary – but you can always come back here for cutting and pasting:

    HEALPix level Integer-divide source id by
    12 34359738368
    11 137438953472
    10 549755813888
    9 2199023255552
    8 8796093022208
    7 35184372088832
    6 140737488355328
    5 562949953421312
    4 2251799813685248
    3 9007199254740992
    2 36028797018963968

    If you know – and that is very valuable knowledge far beyond this particular application – that you can simply jump between HEALPix indexes of different levels by multiplying with or integer-dividing by four, the general formula in the footnote actually becomes rather memorisable. Let me illustrate that with an example in Python. HEALPix number 3145 on level 6 is:

    >>> 3145//4  # ...within this HEALPix on level 5...
    786
    >>> 3145*4, (3145+1)*4  # ..and covers these on level 7...
    (12580, 12584)
    

    Simple but ingenious.

    You can immediately exploit this to make HEALPix maps like the one in the puzzler. This piece of ADQL does the job within a few seconds on the GAVO DC TAP service[1]:

    SELECT source_id/8796093022208 AS pix,
      AVG(phot_bp_mean_mag-phot_rp_mean_mag) AS avgcol
    FROM gaia.edr3lite
    WHERE distance(ra, dec, 246.7, -24.5)<2
    GROUP by pix
    

    Using the table above, you see that the horrendous 8796093022208 is the code for HEALPix level 8. When you remember (and you should) that HEALPix level 6 corresponds to a linear dimension of about 1 degree and each level is a factor of two in linear dimension, you see that the map ought to have a resolution of about 1/8th of a degree.

    HEALPix to Screen Pixel

    How do you plot this? Well, in TOPCAT, do GraphicsSky Plot, and then in the plot window LayersAdd HEALPix control (there are icons for both of these, too). You then have to manually configure the plot for the table you just retrieved: Set the Level to 8, the index to pix and the Value to avgcol – we're working on making the annotation a bit richer so that TOPCAT has a chance to figure this out by itself.

    With a bit of extra configuration, you get the following map of average colours (really: dust concentration):

    Plot: Black and reddish pixels showing a bit of structure

    This is not totally ideal, as at the border of the cone, certain Healpixes are only partially covered, which makes statistics unnecessarily harder.

    Positional Constraints using source_ids

    Due to Gaia's brilliant numbering scheme, we can do analysis by HEALpix, too, circumventing (among other things) this problem. Say you are interested in the vicinity of the M42 and would like to investigate a patch of about 8 degrees. By our rule of thumb, 8 degrees is three levels up from the one-degree level 6. To find the corresponding HEALpix index, on DaCHS servers with their gavo_simbadpoint UDF you could say:

    SELECT TOP 1 ivo_healpix_index(3, gavo_simbadpoint('M42'))
    FROM tap_schema.tables
    

    Hu, you ask, what's tap_schema.tables to do with this? Well, nothing, really. It's just that ADQL's syntax requires selecting from a table, even if what we select is completely independent of any table, as for instance the index of M42's 3-HEALpix. The hack above picks in a table guaranteed to exist on all TAP services, and the TOP 1 makes sure we only compute the value once. In case you ever feel the need to abuse a TAP service as a calculator: Keep this trick in mind.

    The result, 334, you could also have found more graphically, as follows:

    1. Start Aladin
    2. Check OverlayHEALPix grid
    3. Enter M42 in Command
    4. Zoom out until you see HEALPix indexes of level 3 in the grid.

    An advantage you have with this method: You see that M42 happens to lie on a border of HEALPixes; perhaps you should include all of 334, 335, 356, and 357 if you were really interested in the Orion Nebula's vicinity.

    We, on the other hand, are just interested in instructive examples, and hence let's just repeat our colour mapping with all Gaia objects from HEALPix 3/334. How do you select these? Well, by source_id's construction, you know their source_ids will be between 334⋅9007199254740992 and (334 + 1)⋅9007199254740992 − 1:

    SELECT source_id/8796093022208 AS pix,
      AVG(phot_bp_mean_mag-phot_rp_mean_mag) AS avgcol
    FROM gaia.edr3lite
    WHERE source_id BETWEEN 334*9007199254740992 AND 335*9007199254740992-1
    GROUP by pix
    

    This is computationally cheap (though Postgres, not being a column store still has to do quite a bit of I/O; note how much faster this query is when you run it again and all the tuples are already in memory). Even going to HEALPix level 2 would in general still be within our sync time limit. The opening figure was produced with the constraint:

    source_id BETWEEN 83*36028797018963968 AND 84*36028797018963968-1
    OR source_id BETWEEN 86*36028797018963968 AND 87*36028797018963968-1
    

    – and with a sync query.

    Aggregating over a Non-HEALPix

    One last point: The constraints we have just been using are, in effect, positional constraints. You can also use them as quick and in some sense rather unbiased sampling tools.

    For instance, if you would like so see how the reddening in one of the “dense“ spots in the opening picture behaves with distance, you could first pick a point – α = 98, δ = 4, say –, then convert that to a level 7 healpix as above (that's/88974) and then write:

    SELECT ROUND(r_med_photogeo/200)*200 AS distbin, COUNT(*) as n,
        AVG(phot_bp_mean_mag-phot_rp_mean_mag) AS avgcol
    FROM gaia.dr3lite
    JOIN gedr3dist.main USING (source_id)
    WHERE source_id BETWEEN 88974*35184372088832 and 88975*35184372088832-1
    GROUP BY distbin
    

    This is creating 200 pc bins in distance based on the estimates in the gedr3dist.main table (note that this adds subtle correlations, because these estimates already contain Gaia colour information). Since quite a few of these bins will be very sparsely populated, I'm also fetching the number of objects contributing. And then I plot the whole thing, using the conventional (n) ⁄ n as a rough estimate for the relative error:

    Plot: A line that first slowly declines, then rises quite a bit, then flattens out and becomes crazy as errors start to dominate.

    This plot immediatly shows that colour systematics are not exclusively due to dust, as in that case things would only get redder all the time. The blueward trend up to 700 pc is reasonably well explained by the brighter, bluer upper main sequence becoming more dominant in the population sampled as red dwarfs become too faint for Gaia.

    The strong reddening setting in after that is rather certainly due to the Orion complex, though I would perhaps not have expected it to reach out to 2 kpc (the conventional distance to M42 is about 0.5 kpc); without having properly thought about it, I'll chalk it off as “the Orion arm“. And after that, it's again what I'd call Malmquist-blueing until the whole things dissolves into noise.

    In conclusion: Did you know you can group by both healpix and distbin at the same time? I am sure there are interesting structures to be found in what you will get from such a query…

    [1]You may be tempted to write source_id/(POWER(2, 35)*POWER(4, 3) here for clarity. Resist that temptation. POWER returns floating point numbers. If you have one float in a division, not even a ROUND will get you back into the integer division realm, and the whole trick implodes. No, you will need the integer literals for now.
  • Computing Residuals of an Astrometric Calibration

    Two plots, left a fairly good correlation, right a cloudy wave

    The kind of plot you can make following the recipe given here: Left, a comparison of the photometry, right, a positional residuals, not taking into account the SIP plate solution, when comparing the HDAP plate B3261a against Gaia DR3. Note that the cut-off a 4 arcsec is because of the match radius when obtaining the calibrator stars.

    I recently had to assess the quality of the astrometric calibration of a photographic plate. What I am going to show you in this post will of course work just as well for CCD frames, and if these have a sufficiently large field of view, this may be an issue for them as well. However, the sort of data that needs this assessment most typically are scans of plates, as these tend to have a “wobble”, systematic offsets in the scan direction resulting from imperfections in the mechanics.

    Prerequisites: An astronomical frame with a calibration in ICRS (or some frame not very far from it), called my-image.fits in the following, SExtractor (in Debian and derivatives: apt install source-extractor – long live Debian Astro; since it's called source-extractor in Debian, that's what I'll use here, too), and of course TOPCAT.

    Step 1: Extract Sources. Source extraction is of course a high science, and if you know better than me, by all means do it the way you think is appropriate. Meanwhile, the following might very well work for you sufficiently well.

    Create a working directory and enter it. Then, to create a file telling source-extractor what columns you would like to see, write the following to a file default.param:

    ALPHA_SKY
    DELTA_SKY
    X_IMAGE
    Y_IMAGE
    MAG_ISO
    FLUX_AUTO
    ELONGATION
    

    Next, give a few parameters to source-extractor; depending on the sort of image you have, you may want to play around with DETECT_MINAREA (how many pixels need to show a signal to register as a source) and DETECT_THRESH (how many sigmas a pixel has to be above the background to register as a candidate for belonging to a source). Meanwhile, write the following into a file default.control:

    CATALOG_TYPE     FITS_1.0
    CATALOG_NAME     img.axy
    PARAMETERS_NAME  default.param
    FILTER           N
    DETECT_MINAREA   30
    DETECT_THRESH    4
    SEEING_FWHM      1.2
    

    – but if the following call gives you a few hundred sources, that ought to work for the present purpose.

    Then run:

    source-extractor -c default.control my-image.fits
    

    This will give you a catalogue of extracted objects in the file img.axy.

    Step 2: Fix source-extractor's output. Load that img.axy into TOPCAT. Regrettably, source-extractor does not add any useful metadata to the columns of its output table. To add the absolute bare minimum, in TOPCAT go to ViewsColumn Info. In that window, check UCD in the Display menu, and then put pos.eq.ra and pos.eq.dec into the UCD fields of the ALPHA_SKY and DELTA_SKY columns, respectively; double click to change fields in TOPCAT.

    To see if you have done the annotation right, in TOPCAT's main window, click GraphicsSky Plot. If the objects show up, you have just provided enough annotation to let TOPCAT figure out the position for each row.

    Step 3: Get calibrators. We will now try to add counterparts for Gaia DR3 to the extracted sources. To do that, click VOTable Access Protocol, and in the window popping up double click the entry for the GAVO DC TAP.

    In the Find box, type dr3lite to look for this site's version of the Gaia DR3 source catalogue. Click on gaia.dr3lite to select that table, and then select the Columns pane. This should show some of the Gaia DR3 columns.

    Now ExamplesUpload Join will generate a query that will cross-match your extracted sources with the Gaia sources. You should edit it a bit, only selecting the columns you will actually need, removing the TOP 1000 (at least on large images with more than 1000 sources), and reducing the match radius a bit when the calibration is not actually completely off and your epoch is sufficiently close to J2000.

    Hint: you can control-click in the Columns pane and then use the Cols button to insert all the column names in one go[1]. For me, the resulting query would be:

    SELECT
       source_id, ra, dec, phot_bp_mean_mag,
       tc.*
       FROM gaia.dr3lite AS db
       JOIN TAP_UPLOAD.t1 AS tc
       ON 1=CONTAINS(POINT('ICRS', db.ra, db.dec),
                     CIRCLE('ICRS', tc.ALPHA_SKY, tc.DELTA_SKY, 4./3600.))
    

    This should result in about as many matches as your extraction had – a few more is ok, because you will have some spurious matches, a few less is ok, too, as there are always some outliers and artefacts, but you should clearly not pull a magnitude more or less objects here than you put in; fiddle with the match radius as necessary.

    See if there is a rough correlation between the Gaia calibrators and your extracted sources by plotting phot_bp_mean_mag against MAG_ISO. Absent more information, MAG_ISO, source-extractor's guess for the magnitude of the extracted object, will be just some crazy number, but it should have some discernable correlation with the actual magnitude. Do not expect too much here, in particular with old plates, for which good photometry is a science of their own.

    In my example, this looked like this:

    Plot: a rough correlation in red with a green tail

    The green points certainly are spurious matches; this observation did not reach beyond 14th magnitude or so, and there are many weak stars on the sky, so a few of them will show up in just about any cross match. See the opening picture for an example with a better correlation.

    Step 4: Do the correlation plot. Do GraphicsPlane Plot and then plot ra-alpha_sky or dec-delta_sky against X_IMAGE or Y_IMAGE. You could get something like this:

    Plot: A single wavy thing

    This rather certainly reflects some optical distortion; source-extractor regrettably does not take into account SIP corrections yet, so it is likely that a large part of this would be taken care of by the polynomials of the plate solution (the github issue I am linking to tells you how to be sure).

    But it can also look like this:

    Plot: Multiple wobbles

    This certainly is not the result of a lens or anything optical at all. It's the scanner's gears that you are looking at here. With an amplitude of perhaps three arcseconds this is rather excessive here; but something like this you will rather likely see even on good scanners – though it may essentially be invisible, as of the Heidelberg scanner we used for HDAP:

    Plot: A vertical cloud with no discernible structure.
    [1]I'm using the BP magnitude in the query below as most historical plates tend to be “blue sensitive“ (in some sense). Hence, BP magnitudes should be a bit closer to what source-extractor has extracted.
  • DaCHS is now at Version 2.7

    Logo-ish 2.7 with a multi-array plot

    Last Friday, I have released Version 2.7 of GAVO's Virtual Observatory server package DaCHS. As is customary, I will give a brief overview of the more noteworthy changes in this blog post. This is probably only of interest to people running DaCHS-based data centres. What I discuss here is both a bit more verbose and a bit less extensive than what you find in the Changes file (when installed from package, you would read it by running /usr/share/doc/python3-gavo/changelog.gz).

    The highlight in this release from my view are simple, numpy-like vector operations in ADQL. Regular readers of this blog will already have seen an example for their use. This is altogether a prototype, which is why what specification is there is only on the IVOA wiki. It is thus likely some details of the vector math will change until they make it into any sort of standard (I am hoping for ADQL 2.2). This should not keep you from trying it out and telling your users about it.

    In that same vein, the FITS binary table grammar now copes with vectors, which makes it easier to populate tables that make these operations useful, and for the sort of large tables where the array magic has particularly much promise, it is now a lot simpler to feed array-valued columns with C boosters.

    Other ADQL work includes the addition of proper, standards-compliant epoch propagation (i.e., “application of proper motion and radial velocity“) in the form of the ivo_epoch_prop and ivo_epoch_prop_pos user defined functions. Regrettably, this will not immediately work for you, as it builds on a feature in pgsphere that upstream has not merged yet; comments on that PR will certainly help make that happen. Of course, if you want, you can just build the pgsphere branch containing the new feature yourself. To make up for this complication, DaCHS will no longer advertise UDFs that will not work given the database extensions present – which will help me be a bit more liberal in letting in UDFs wrapping functionality not in Postgres' default distribution in the future.

    If you run datalink services and have multiple items with the same semantics, you may be interested in using local_semantics in Datalink. The use case here is that clients like TOPCAT will remain on, say, light curves in a red filter when the user jumps between records rather than randomly switching between red and blue ones when both have #coderived semantics (Mark's proposal). If you have data of this kind: you can now pass a localSemantics parameter to the makeLink and makeLinkFromFile methods of datalink descriptors; what string you use is up to you, as long as it's the same between similar rows for different datasets.

    I tend to forget that surprisingly many people actually do something with the ADQL form you get on DaCHS' web interface rather than use a TAP client. Well, a DaCHS operator complained about really sub-standard table headings in the HTML tables coming out of this service. Looking again, I had to admit he was right. So, TAP columns now have more meaningful table headings; in particular, if you write expressions, up to a certain length these expressions will be used as table headings. At least in this respect the ADQL form now has an advantage over using a proper client.

    In case you have a processor doing astrometric calibration with astrometry.net (you probably don't because it would have been very hard to make that work on without a lot of hacks so far) – have another look at the documentation because I have had various reasons to change api.AnetHeaderProcessor's API in quite a number of ways. It's now a lot easier to use with astrometry.net and source-extractor as distributed by Debian, but I'd still not have broken the API so badly if I had suspected anyone but me had significant code against this.

    I should also warn you that DaCHS now uses astropy to format sexagesimal times and coordinates. This is probably welcome news to those who ever encountered one of DaCHS' 05:59:60 outputs (which happened due to the way it did its rounding). Still, if you have regression tests testing for strings like that, you will need to update them.

    From the many minor fixes I should probably mention that DaCHS is now ready for Postgres 15 (which will probably the Postgres version in the next Debian stable). This used to be broken on new installations because Postgres 15 no longer lets normal users write to the public schema. DaCHS needs a database role that can do this, though, because it defines public functions. Since version 2.7, it does the necessary setup to make this possible. If you make your public schema non-world-writable manually – Postgres upgrades will not do that for you, and I would say there is no strong reason to do so for databases backing DaCHS –, do not forget to GRANT ALL ON SCHEMA public TO gavoadmin.

    With this – don't wait, upgrade. If you have GAVO's repository enabled, apt update && apt upgrade would probably do the trick, though of course I recommend having a look at our upgrading guide for robustness and good housekeeping.

  • Another Virtual Interop

    A part of a presentation slide, containing the sentence “to select single/multiple rows from plot use Handles layer

    One thing you could learn at this interop: How to identify the source row of a line in the TOPCAT's XYArray plot. See the end of this post for where this comes from.

    It's Interop time again! That is, most of the people involved in developing the Virtual Observatory (or for it) will report on what they have been up to since the last Interop, and what they are planning for the near-ish future. It is again an online meeting, so if interested, you could still register and then attend a couple of our sessions.

    You will see me as a chair (but for the first time since I became chair there not as a speaker) in Semantics, and I'll have talks in Registry (obligatorily) and DAL 1, though regular readers of this blog will have a few déjà vus.

    I plan to update this post as the meeting progresses – so, perhaps check back a few times until thursday.

    Update 2022-10-18, 15:00 UTC: I was expecting the VO in the Cloud Plenary with quite a bit of anxiety, because “in the cloud“ these days tends to mean “stuff things into proprietary walled gardens“. The first input talk turned out to be quite a bit less scary: Data providers want to have links to commerical cloud providers in addition to http download links. That's reasonable given users may want to optimise accesses for large data sets, and seeing that most respondents pointed to Datalink as the way to do that (as I did) was nice. The devil is in the details, though: Making good concepts that let clients figure out what are, in a sense, “equivalent“ ways to obtain the data is probably hard. The one thing I'm sure about is that I don't want concepts like #aws-metadata in datalink/core.

    And the rest of the session was rather a “how VO standards are or may be useful to us“ rather than the “dump the old open rubbish and move on to walled gardens“ I was worrying about. So… excellent!

    Update 2022-10-18, 21:10: Sitting in the DAL 1 Session, I am seriously tempted to become a gardener while listening to Tom's talk on Firewalls against ADQL. I have to thank U Heidelberg for hosting our services without horrible “Web Application Firewalls“ or trying to hack into https connections to “sanitise“ requests. At STScI, it seems the density of snake oil “security appliances“ is so high that at least somewhat advanced network usage like TAP and ADQL becomes really shaky.

    Can we just genrally disarm and perhaps, if SQL injection really is a problem in individual cases, just hire programmers on permanent contracts (meaning: they'll aquire sufficient experience) and/or reviewers for the software we run facing the net? It's not like SQL injection is just bad luck. It's a bug in every single case, and a sort of bug that's relatively simple to avoid – simpler in any case than detecting SQL injection attempts with a reasonable false-positive rate.

    Update 2022-10-20, 5:00 UTC: Yesterday, I had reasons both for rejoicing and for wishing for a brown bag. The rejoicing part was (for instance) in the solar system session, where Steve Joy reported on getting PDS Planetary Plasma Interactions (PPI) data into the VO – that's a good thing no matter what, especially given that I have a very soft spot for solar system data anyway. As the main author of DaCHS, however, I was particularly happy to see PPI are using it to talk to the VO. DaCHS thus is now running in Los Angeles, too. Hollywood, practically.

    The brown bag moment came in the Registry session; while my talks I think went fine – one of them basically being the oral version of a post from this blog –, Tom's talk on pyvo.registry made me cringe because he pointed out a bad interoperability sin on my side. The problem was not that my code unconcernedly uses COALESCE. From private mails I had understood, perhaps somewhat over-optimistically, that RegTAP operators had greenlighted that after my DAL post from last December, and it's a really simple extension anyway. I give you, though, that I should have ensured that COALESCE really had arrived on the servers before pushing for merging the new regsearch code into pyVO.

    No, what's really embarrassing is the UNION business. You see, the regsearch keyword constraint looks for the words in multiple places, and so it does something conceptually like WHERE keyword matches table1.descripition OR keyword matches table2.subject. Such cross-table ORs are generally extremely hard to plan for the database server, and thus when I re-wrote query generation for the RegTAP keyword search I just put in UNION – queries are really two orders of magnitude faster on my server this way.

    However, UNION has not been part of ADQL 2.0, and although I've lobbied for the set operators for a about a decade now, they are not formally part of ADQL yet. They will be part of ADQL 2.1, but even then they will not be mandatory. Hence, I should not blindly have employed UNION in code supposed to be interoperable, even less so because I can actually programmatically figure out whether a service supports UNION (from the TAP capabilities) and hence could have put in a fallback for where it's unavailable. Aw, dang.

    Update 2022-10-20, 20:00 UTC Just two sessions to go – Radio and Closing, though that little rest will be a challenge, with the closing session ending at 1 am my time.

    Thus, in the midnight hour, for the Semantics working group I will report on our session, which had quite a bit of rather deep plumbing this time. For instance, for the update to our standard on unit syntax, Norman raised the question whether “%“ ought to be a legal unit, and if so, if there's any way to keep ppm, ppb, and ppt out (؉ or ‰, on the other hand, are easy to keep out: We're really stubbornly insisting on pure ASCII). This may border on bikeshedding, but it has very concrete consequences on clients (such as astropy's unit parser) and services (where, for instance, VizieR has to cope with submissions that have columns given in percent). Before the session, it looked like we'd just let in percent, and that only grudgingly. Now… it's likely we will have to be more liberal.

    Great news in the session was that there is now a prototype of a Rosetta Stone for facility names in Paris, that is, a service that lets you map between all the different names your typical observation facility has (for instance, the part of my institute that is up on the mountain could be known as Königstuhl Observatory, Landessternwarte Königstuhl, LSW, Zentrum für Astronomie Heidelberg, and much more). If you have never tried linking all these various names up, you will be surprised how hard that problem is. See Baptiste's slides for how they are tackling it and how they are applying hardcore Semantics tech – in particular, SPARQL – to do it. I liked it a lot.

    Another talk I would like to call out is Steve Crawford's from the session of the Data Curation and Preservation IG. His recommendation to go with CC0 for, well, licensing, is something I can only support exactly because it is not a licence at all, which relieves you of the troublesome problem of assinging copyright so someone. That triviality is only the first of several legal problems we have since we have put the IVOA documents under CC-BY. But since nobody is ever going to court about any of this, the legal trouble is perhaps not terribly worrying. What is nasty about CC-BY is that whatever is licensed CC-BY is (generally) incompatible with the GPL and many other software licenses, which means you will get in trouble if you try to package it with something destined for Debian. And Steve makes some excellent points why CCO is just fine for science data.

    Finally, if you liked the posts on array plotting in TOPCAT and usage in ADQL, you should definitely have a look at Mark's talk in this morning's Apps session, where he in particular shows how you can go from a line in the array plot back to the row that contains the array.

    And with that I've told you where the opening slide fragment came from. Good night!

« Page 3 / 19 »