• At the 2026 Strasbourg Interop

    It's Interop time again!

    A lecture all with raised seating and a blackboard, two persons behind a wide lectern preparing for a talk.  On the wall, a slide with the title “state of the IVOA” is shown.

    Semiannually, the VO community meets to discuss what we've done since the last Interop and what needs to be done in the future. This week it does so again, this time in Strasbourg (programme).

    The posts tagged with Interop will give you an impression of how these meetings feel, and I'd like to do some close-to-real-time blogging about this one again; just come back here occansionally until Friday if you are curious.

    Right now, I am sitting in the opening session, remembering how, half a year ago, I was hectically trying to keep everything together at the Görlitz Interop when I was the local organising committee. Oh, how much more professional everything is here in Strasbourg's manufacture des tabacs: Good sound, zoom room working on the first attempt, no sun rays blotting out the projection screen, eduroam internet. It's really a nice lecture hall here, where we had to cobble together something rather improvised in Görlitz. Infrastructure matters.

    Memories aside, the first talk of an Interop traditionally is the State of the IVOA delivered by the chair of the Exec, which quite as traditionally sports slides from the member organisations. I had to smile and couldn't help being flattered when JJ took up my nerd theme and quipped about “what the nerds like“ or so on GAVO's slide.

    Going on, I can't resist a piece of trivia: The claim of the report on the activities of the Committee for Science Priorities was “no acronyms“ (and I will give you that for outsiders, the density of odd words between ADQL and UWS is a bit scary). Well: It was 22. A colleague counted. But then by Interop standards, that's still a pretty impressive achievement.

    Oh, and the State of the TCG closed a whopping seven minutes before then end of the session, and now a discussion on how to do VO propaganda has ensued. People make a point that there's few things more useful for that than hands-on courses. Which has been a cue for me because since the last Interop, my pet project DocRegExt is exactly designed for that and became an official standard (“recommendation“) just in the last semester. Ha!

    Monday 15:00 – The Local Host Session

    The first “business“ session of this Interop has talks advertising the achivements of the “local“ VO enthusiasts, where at first “local” means French.

    Ada Nebot's talk on OV [sic!] France is a bit humbling for me. For instance, they have a mailing list for technical discussions with more than 100 subscribers – wow. In Germany, at GAVO, we never made it beyond a dozen for our equivalent. Perhaps I should have worked a bit harder on hauling in money after all?

    But then of course France profits from a far-sighted personnel planning: There actually have been permanent positions for data curation and publication over here since several decades. Let's see how this pans out back in Germany – this year, we will fill the first positions that are at least planned to become permanent for the new data centre at the DZA in Görlitz.

    Carolin Bot then relates stories about the CDS, which I'd chalk down as the most important data centre in the VO, partly because of the Simbad database. This, I just learned from Carolin, collects object data from a whopping 15'000 articles per year.

    I get queasy when I consider that there are close to 100 new scientific articles in Astronomy alone every working day that Simbad processes (which means that there's a lot more that don't talk about objects). I can't resist mentioning that we really need to fix our publication system by either getting rid of performance-fantasising metrics altogether (which would be my preferred outcome) or at least use something else than publications. Still, great work, Simbad. Thanks, and thanks a lot for your TAP service, which is an incredibly powerful tool. If you, dear reader, do not know what I'm talking about, by all means check out our VO course (which features it).

    Talking about metrics abuse: Carolin also reports the CDS is serving 5 million queries per day. I'd certainly not want to use this as a proxy for CDS' usefulness – one smart TAP query or a catalogue crossmatch could replace a million requests each while providing a better service –, but it means that CDS' servers have to withstand 50 requests per second on average. Even if modern computers are amazingly fast, that is a certain challenge, in particular considerung that some of these requests can cause many seconds of computation.

    Hours, actually, if you don't pay attention to efficiency. Fortunately for CDS, there are people there who actually look at efficiency and realise there's a difference between code that takes half a second on the one hand and code that takes 50 ms on the other hand – something I myself rarely indulge in.

    And then there was a great slide in Andy Götz' concluding talk on the European Open Science Cloud EOSC (the “local” is Europe in that case), where he makes it clear that the EOSC is not a cloud, not (only) European (because “open” only makes sense if you don't close out the rest of the world), and regrettably it's not always open, either. If there is a useful definition of the EOSC beyond “a funding scheme of the EU“, however, I still could not figure out.

    But then I freely confess to being very skeptical about discipline-spanning data publication in the first place on grounds that there are not many problems that the different disciplines actually share; I couldn't name much beyond AAI (i.e., authentication, which I'd rather not have at all in the first place) and PIDs (persistent identifiers; and these are a lot less useful on top of non-permanent infrastructure than you would think). Let me stress that I'm saying this as someone who's been soliciting contributions to my Stories on Cross-Discipline Data Discovery for a long time. You'd be surprised how little enthusiasm for this kind of thing is out there.

    Monday 17:00 – Apps I

    I'm sitting in the first session of the Appliations Working Group (“Apps”), which in VO circles is affectionally known as Show & Tell.

    Against this cliche, the first talk (sorry, no link: it would go to Google docs) is about HATS, a fairly cool new format for dealing with large catalogues without having to run TAP and ADQL. It is a bit of a cross between HiPS and Parquet. By Apps standards, though, the talk it was fairly technical and had few colourful pictures. You could argue that is a quality, and I could not deny that.

    Things became a bit more baroque in the next talk: Pierre Fernique had the chair turn the light down before starting his slides – and will, I think, now show how you can interact with a data cube of 600 GB (a) at all, (b) over the network, and (c) on a very moderate machine. This already works from the comfort of your home (or office) with the most recent Aladin beta (v12.675). Try the HIPS3D subtree in the discovery tree; lightcone is fun, but being able to zoom through the spectral cubes from CALIFA to me is more impressive:

    A screenshot of the Aladin client with HiPS3D → cds → CALIFA open and a black/white image and a spectrum displayed.

    FX Pineau next reported on HATS progress (among other things), namely that the CDS now produces the HATS files I talked about above on the fly. Hu. Should DaCHS know how to do that, too?

    Beyond that, in that talk you can see a few instances of what I was referring to above when I said CDS folks do consider efficiency. I like it if even today software people still consider the number of disk seeks required to do what their programs try to do. Yes, I know that with SSDs they're not nearly as expensive any more as they used to be, but, you know, my mass data is also still served from spinning disks.

    Tuesday 10:00 – Sciene Platforms

    At this Interop, there is a plenary session on science platforms. I have already ranted about this return of the data silos during the last interop. In Tess' talk, there was a slide that nicely summarises my concerns:

    A slide with two columns, telling stories why there are five different US astronomy science platforms, fornax, roman nexus, rubin, sciserver, Astro Datalb.

    So: everyone spends a lot of effort on building complex systems of their own that can (in the most extreme case) process just a single sort of data (theirs), requiring different credentials, different code, that cannot interoperate, that are pretty much silos that, when they go down, will take all the software and workflows written for them with them. Most of them, I think, also depend on AWS, and what happens when Amazon changes their rules and/or pricing is anybody's guess.

    Tuesday, 12:00 – DAL I

    In the first DAL session of this Interop, I'm a bit distracted because back in Heidelberg our computation centre has again cut off our servers. After they had a two day “power outage“ over pentecost and the still-unresolved November disaster, I again regret the day when the University forced us to move our servers to them.

    I still appreciated Pat's remark in his talk on OpenAPI for TAP 1.2 that if we could go back in time, we certainly wouldn't make our protocols' query parameter names case-insensitive. Absolutely. I'd widen that statement: Whenever you think that case-insensitivity is a good idea, you are probably wrong. My experience is that this will almost certainly going to come back at you later to no end of headache. Just have a look at the RegTAP spec and search for “case”. Each of these places cost me a bunch of hair.

    Case folding considered harmful. Let's not do it any more.

    I was re-inforcing that point in the first talk I was giving at this Interop, SCS-2.0 prototype implementation

    In there, I reported that purging case-folding from some parameters of SCS2 (the one it shares with case-folding protocols) was really painful – but also unavoidable. Other than that, I was delighted that there was a lively discussion afterwards; at least there's interest in the activity, albeit it seems more in the protocol itself rather than the management of a major version transition that is, really, why I am after SCS2.

    It would seem, anyway, that we will go on with SCS2. There's a time line in the SCS2 draft that covers something like five years. In that sense, this session may very well have been the point of no return for a long, long journey.

    A timeline with about 10 yellow milestones and a few bars marking activities.  The first year is 2026, the last year 2032.
  • Limits, Materialisation, and Anchor Texts: DaCHS 2.13 is out

    AI slop: Ten badgers on a grassy floor.

    With all the crazy Star Trek-sounding talk of “materialising obscore” below I could not resist and asked stabledifffusion.com for „Thirteen badgers materialising obscore“. Well, counting badgers is hard, and I wouldn't have been sure how to visualise obscore, either. Rest assured, though, that the remainder of this post is not AI slop and at least factually correct.

    It's been almost a year since the last release of our publication package, DaCHS, and so it's high time for DaCHS 2.13. I have put it into our repository last Friday, and here is the obligatory post on the major news coming with it.

    Perhaps the biggest headline (and one that I'd ask you to act upon if you run a DaCHS system) is support for the new features in the brand-new VODataService 1.3 Working Draft. That is:

    • Column statistics. This is following my Note on Advanced Column Statistics on the way to improved blind discovery in the VO. To have them in your DaCHS, all you have to do is upgrade and run dachs limits ALL – and then make sure you run dachs limits after a dachs imp you are satisfied with (or use the new -l flag discussed below). Please do it – one can do a lot of interesting discovery in the Registry (and perhaps quite a bit more) if this is taken up broadly.

    • Product type declaration. So far, when you wanted to discover, say, spectra, you would enumerate the SSAP services in the Registry, perhaps with some additional constraints (e.g., on coverage), and then query each of those.

      Linking data types and protocols was a reasonable shortcut in the early VO. It no longer is, for a whole host of reasons, among which Obscore (which can publish any sort of observational data) ranks pretty high up. So, in the future, we need to be explicit on what among the terms from http://www.ivoa.net/rdf/product-type will come out of a service.

      Where this is immediately useful is when you publish time series through SSAP (which is not uncommon). Then, just put:

      <meta name="productType">timeseries</meta>
      

      into the root of your RD (the time series template in 2.13 already does this). If you publish cubes through SIAP, you should similarly say:

      <meta name="productType">cube</meta>
      

      For other SSAP and SIAP services, you probably don't need to bother at this point.

      For obscore, DaCHS will do the declarations for you if you have run:

      dachs limits //obscore
      

      – which is a good thing to do anyway (see above).

    • Data source declaration. For most purposes, it is really important to know whether some piece of data you found is based on actual observations or whether it's data coming out of some sort of simulation.

      So far, the only protocol that let you say something like that was SSAP. But there's now all kinds of other non-observational data in the VO, and so VODataService 1.3 introduces the vocabulary http://www.ivoa.net/rdf/data-source to let you say where the data you publish comes from.

      The default is going to be observational for a long while. If that's what you have, don't bother. But if you publish results from simulations (more or less: starting from random numbers), put:

      <metaName="dataSource">theory</metaName>
      

      into your RD's root, and if it's data based on actual objects (simulated observations for a new instrument, say, or model spectra for concrete stars), make it:

      <metaName="dataSource">artificial</metaName>
      

    To make filling in the VODataService column statistics somewhat less of a hassle, I have added an -l flag to dachs imp. This makes it run (in effect) a dachs limits after the import. I'm not doing this on every import because that would slow down the development of an RD; obtaining the statistics may take quite some time, and for certain sorts of tables you may prefer to run dachs limits with your own options.

    You could argue I should have inverted the logic, where you'd rather pass a flag saying “don't do limits” during development. You could probably convince me. But until someone protests, just remember to add an -l flag to your last import command.

    There are a few more prototypes for (possibly) upcoming standards in DaCHS 2.13. For one, you can now write units in ADQL queries as per my proposal at the Görlitz ADASS. That is, you can annotate literals with units in curly braces (as in 10{pc}), and you can convert values with known units into other units using a new operator @. For instance, if you were fed up with the stupid angle unit we've been forced to accept since… well, about 2000 BC, you could put the interface to saner units into your queries like this:

    SELECT TOP 20
      ra@{rad}, dec@{rad}, pmra@{rad/hyr}, pmdec@{rad/hyr}
    FROM gaia.dr3lite
    

    This is not a big advantage if you write queries just for a single catalogue. It does make a difference when you write queries that ought to work across multiple tables and services.

    While you should not notice the per-mode limit declarations coming from an unpublished draft of TAPRegExt 1.1 (except that the async limits TOPCAT shows will now better match what DaCHS actually enforces), you could appreciate the support for StaticFile that comes out of DocRegExt 1.0. There, it is used to register single PDF files or perhaps ipython notebooks. When you register such things[1], you can now say something like:

    <publish render="edition" sets="ivo_managed">
      <meta>
        accessURL: \internallink{\rdId/static/myfile.txt}
        accessURL.resultType: text/plain
      </meta>
    </publish>
    

    The result of this will be that DaCHS produces a doc:StaticFile interface rather than vs:WebBrowser, and it will produce a resultType element saying that what you get back is plain text (in this case). If you have other applications for having static files like that in registry records, do let me know.

    My investigation into slow obscore queries I already reported on here led to two changes: For one, some types in the obscore table changed, and in consequence dachs val -vc ALL will complain when you pulled in the obscore columns into your own tables. Just try the val -vc and either re-import the affected resources at your leisure (it's only an aesthetic defect, things will continue to work) or change the column types as described in the blog post linked above.

    Probably more importantly, you can now materialise the obscore view (actually, in order to let you drop the contributing tables at will, it's not a materialised view but a table, but that's… immaterial here). You want to do that if you have many contributions to your obscore table, at least some queries against it become slow and you can't seem to figure out why. See Materialised Obscore in the tutorial to see what to do if you want to materialise your obscore table, too.

    Something perhaps worth exploring for you is that you can now publish entire RDs. I implemented this for a resource with lots of little “services” (actually, HiPSes) that share so many pieces of metadata that it just seemed wrong to have them all separate resource records (though I am in discussion with the HiPS people who are not particularly fond of having multiple HiPSes in one resource record), nsns. Beyond that, you could have, say, a cone search for extracted sources, an image service and a browser service for both in one RD and then say, in the RD section with top-level metadata:

    <publish sets="ivo_managed"/>
    

    – everything should then live nicely as separate capabilities within one resource record and that without any of the publish/@service tomfoolery you had to use so far to glue together VO and browser services.

    For local publications (i.e., browser services appearing on your front page), this will result in a link to the RD info (minor DaCHS secret: <your server URL>/browse/<rd-id> gives an overview over the tables and services defined in an RD). Whether that's useful enough for you in such a case I cannot predict. But you can mix all-RD publications in ivo_managed with conventional <publish sets="local"/> elements for browser services.

    Among the more minor changes, the default web form template now employs a WebSAMP connector, which means that the SAMP button on results of the form renderer is now greyed out until a SAMP hub becomes visible on your machine.

    If you use a display hint type=url, you can now control the anchor text on the a element in HTML output by setting a property anchorText on the corresponding column. Yes, that will then be constant for all the products. If you really need more control than that, you will have to define a formatter for a custom outputField.

    So far, the fullDLURL macro could only be used when you actually had a normal, filename-based DaCHS access reference. This was unfortunate because this kind of thing is particularly convenient for “virtual” data generated on the fly. Hence, you can now pass some python code in a second fullDLURL argument that must return the accref to use. Read a bit more on the context in Datalinks as Product URLs.

    There are many other minor changes and fixes that you hopefully will only notice because some annoying behaviour of DaCHS is now a little less annoying.

    If you spot problems or miss something, feel free to report that at our new repository at Codeberg. The main VCS for DaCHS still is https://gitlab-p4n.aip.de/gavo/dachs. But we will probably migrate to Codeberg by the 2.14 release to make reporting bugs and writing pull requests simpler.

    Perhaps we will receive some from you?

    [1]Using resType: document; I notice I should really add some material on registering educational material with DaCHS to the tutorial.
  • Porting a DaCHS SIAv1 service to SIAP2

    a distorted title page of the SIAP version 2 standard, centred on the date 2015-12-23

    Ten years after, let me talk about SIAP version 2.

    In December 2015, the IVOA made Simple Image Access Version 2.0 (hereafter: SIAv2) a Recommendation (that is: the standard you should be following). I am fairly sure that most people into computers would have understood that as “Don't do Simple Image Access version 1 (SIAv1) any more“. As of ten years ago.

    This is not how things worked out. Actually, to this day new SIAv1 services still come online. In the talk about major version transitions I gave in College Park last June, I remark that 20% of the registered SIAv1 services were younger than 30 months.

    There are many reasons why obsoleting SIAv1 has not worked (yet); very frankly, I had rather fiercely argued we don't want SIAv2 at all on grounds that Obscore is all you need to discover products of observations.

    But since it's there now I feel I should do something for its adoption, beginning with not pushing out any new SIAv1 services myself. So, when a data provider sent me an RD they built from a previous one and it would have published a new SIAv1 service, I thought this was the time to start updating my own services.

    The next step then is to encourage DaCHS adopters to help out, too, that is, to port over their RDs from doing SIAP version 1 to doing SIAP version 2[1]. That's why I am writing this blog post.

    Going From SIAv1 to SIAv2 in 11 Moderately Difficult Steps

    Since the output table schema (and quite a bit beyond that) changed between the two version, the port is not entirely trivial; if it were, we wouldn't have done a major version (i.e., breaking) change in the first place. But I'd argue it's quite doable when two conditions are met:

    • You have a DaCHS version 2.8 or later (if not, you should upgrade anyway).
    • You are not using siapCutoutCore right now; what this does is hard to replicate in SIAv2 (because positional constraints are now optional), and so if you want to keep the auto-cutout functionality, you probably are stuck on SIAv1.

    That said, here's my recipe:

    1. Change the mixin on the table that keeps the image metadata. So far, you probably had mixin="//siap#pgs". Drop this and add:

      <mixin have_bandpass_id="True">//siap2#pgs</mixin>
      

      to the table body instead. If you really have no bandpass you would like to mention, you can leave out the attribute definition.

    2. Change the obscore mixin in the table body if you did an obscore publication (skip this step if not). With SIAv2, write instead:

      <mixin preview="access_url || '?preview=True'"
        >//obscore#publishObscoreLike</mixin>
      

      It is really simple now because SIAv2 just re-uses the obscore schema.

      Keep your old mixin definition in a scratch pad (or the version control history at least), because it will help you when you fill out the parameters to //siap2#setMeta.

    3. Change any index statements for standard columns you may have; the column names are completely different between SIAv1 and SIAv2. Classic examples include:

      • bandpassId is bandpass_id (if available)
      • bandpassLo is em_min
      • bandpassHi is em_max
      • dateObs should become indexes on t_min and t_max.

      If your table is small enough that you managed without indexes so far, don't bother creating new ones.

    4. Check custom extension fields for whether they are now in core SIAv2. The classic case is exposure time, which was missing in SIAv1. Just drop your custom column definition(s).

    5. If there is datalink on the SIAP table, you will have to change its definition, too; the relevant column is now obs_publisher_did. If your datalink service has the id dl, the result of the operation would be this:

       <meta name="_associatedDatalinkService">
        <meta name="serviceId">dl</meta>
        <meta name="idColumn">obs_publisher_did</meta>
      </meta>
      

      This may lead to datalink failures in DaCHS < 2.13 (in that the datasets are no longer found). If this bites you, let me know.

    6. Fix the rowmaker for the SIAP table. For the computePGS and getBandFromFilter apply, just add a 2 to their procDef references, so that these become //siap2#computePGS and //siap2#getBandFromFilter (if applicable).

      The main work is going from //siap#setMeta to //siap2#setMeta, because their parameter sets are somewhat different, although they do map to each other to some degree.

      The way to do the migration is to go through SIAv2's setMeta's parameter list in the reference documentation and identify the old parameters, or take the values from your obscore definition. Once you are past this point, you have done the heavy lifiting.

      (For completeness, let me mention that you will probably get away with dropping pixflags and keeping the other parameters as they are, as there is some compatibility glue; but you'd miss setting up extra SIAv2 metadata, and that would be a shame).

    7. Experimentally run dachs imp. This will probably fail because there are references to old column names in, say, service definitions. Resolve these based on the names you used in setMeta (which largely double as the column names). When you made DaCHS accept your refurbished RD and have run the import, use dachs info to catch metadata items you have missed.

    8. If you used a shared core for both a service with the siap.xml renderer and the web form service, move that core into the web form service. Use //siap2#humanInput for the new positional constraint, and drop the #protoInput, if it is there, because it is no longer needed.

    9. The protocol service has to have allowed="siap2.xml", and its new core is:

      <dbCore queriedTable="main">
        <FEED source="//siap2#parameters"/>
      </dbCore>
      

      Replace "main" with whatever your table is called, and add any custom parameters you would like to have.

    10. In your regression tests (you have some, don't you?), change the renderers in the URIs (siap2.xml instead of siap.xml), and change POS and SIZE into POS="CIRCLE ..."; it is likely that you will also have to change column names in the assertions.

    11. In your publish element(s), change render="siap.xml" to render="siap2.xml"

    12. Run dachs pub q to tell the Registry that your access URL has changed.

    That's it.

    I would argue this is time well spent. Even if one day there will be a successor to SIAv2 (and I do hope there will be one), it is highly likely that its metadata schema will align very well with obscore's, and hence most of the work you just did will put you in a very good position to switch to DAP with just a few keystrokes.

    [1]If you have built SIA services with DaCHS 2.8 (2023) or later using dachs start, you will already have a SIAv2 service; see the discussion in the pertaining release notes.
  • Queries Against My Obscore Are Slow!

    Content Warning: This is fairly deep nerd stuff. If you are just a normal VO user, you probably don't want to know about this. You probably even don't want to know about it if you are running a smallish DaCHS site. But perhaps you'll enjoy it anyway.

    Last May, I finally tried to get to the bottom of why certain queries against my obscore table – and in particular some joins I care about – were unneccessarily slow. The immediate use case was that I wanted to join the proposed radio extension for obscore to the main obscore table like this:

    SELECT COUNT(*)
        FROM ivoa.obscore
        JOIN ivoa.obs_radio
        USING (obs_publisher_did)
    

    This looks harmless, in particular since there are almost always indexes on obs_publisher_did columns for operational purposes: DaCHS uses them to locate rows in, for instance, Datalink operation.

    It is not. Harmless, I mean. On the contrary.

    Match Your Types Before You UNION ALL

    The main reason why there is a trap is that ivoa.obscore in DaCHS is a view (i.e., some sort of virtual table defined by a SQL query). This is because typically, multiple data collections contribute, and they can change independently of each other. We do not want to have to rebuild a full obscore table (which has almost 150 million rows in the Heidelberg data centre right now) just because we fix the metadata of a handful of images somewhere.

    Hence, ivoa.obscore is built somewhat like this in DaCHS[1]:

    CREATE OR REPLACE VIEW ivoa.obscore AS
     SELECT 'image'::text AS dataproduct_type,
        NULL::text AS dataproduct_subtype,
        2::smallint AS calib_level,
        'BGDS'::text AS obs_collection,
        ...
     FROM bgds.data
    UNION ALL
      SELECT 'image'::text AS dataproduct_type,
        NULL::text AS dataproduct_subtype,
        3::smallint AS calib_level,
      ...
    [and 42 further subqueries that are union-ed together]
    

    It turns out that this architecture is dangerous in Postgres.

    Laurenz Albe has a writeup on the underlying problem, which he summarises in a cartoon as “Before I UNION ALL you, be sure that your types match”. In short, UNION ALL becomes a planner barrier when the types of the columns of the relations being merged do not exactly match. For this purpose, a bigint is completely different from an integer.

    Full disclosure: it's not like I figured out the applicability of Laurenz' analysis to the DaCHS troubles by myself. It actually took multiple applications of the cluestick by Tom Lane, Laurenz, and others on pgsql-general.

    Known Problem Is Not Solved Problem

    Hence, since May, I sort-of understood the problem. Fixing it, on the other hand, seemed rather overwhelming given the size of the view and sometimes multiple levels of view building. In consequence, I procrastinated actually doing something about it until some time last November when I realised that the computer could support the analysis of what types from which tables do not match.

    I therefore wrote analyze-obscore.py and added it to the DaCHS repo. It will (presumably) never be part of the DaCHS package, but you can simply run it from a clone of the repo – and should do so if you have an obscore view fed from multiple tables.

    The output then is something like:

    ==== access_estsize ====
    
      bgds.data                      accsize/1024
      danish.data                    accsize/1024
      dfbsspec.ssa                   accsize/1024
      plts.data                      accsize/1024
      emi.main                       access_estsize (bigint)
      rosat.images                   accsize/1024
      califadr3.cubes                10
      robott.data                    accsize/1024
      k2c9vst.timeseries             accsize/1024
      dasch.narrow_plates            access_estsize (bigint)
      onebigb.ssa                    accsize/1024
      [...]
    
    ==== access_format ====
    
      bgds.data                      mime (text)
      danish.data                    mime (text)
      dfbsspec.ssa                   mime (text)
      plts.data                      mime (text)
      emi.main                       access_format (text)
      rosat.images                   mime (text)
      califadr3.cubes                'application/x-votable+xml;content=datalink'
      [...]
    
    ==== calib_level ====
    
      bgds.data                      2
      danish.data                    2
      dfbsspec.ssa                   2
      plts.data                      1
      emi.main                       calib_level (smallint)
      rosat.images                   2
      califadr3.cubes                3
      [...]
    

    and so on. That is: for each table contributing to a column, it either shows the source column together with its type, a literal, or the full expression. Literals are not problematic: as it turns out, DaCHS has always cast them to the appropriate type, so as long as the other source columns match what obscore thinks the columns ought to be, you should be fine.

    Expressions are more difficult. The only way to be sure there is to ask Postgres, somewhat like this:

    select pg_typeof(accsize/1024) from bgds.data limit 1
    

    Changing Types En Masse

    In my case, I had lots of inconsistencies between columns coming from SSA and more directly from obscore-like tables. If you have spectra and other things in one obscore table created by DaCHS <2.12.2, so will you.

    This is because in my obscore implementation I followed the somewhat ill-advised types written down in (but in my reading not actually requried by) the obscore specification (p. 21). There is no conceivable scenario that would require more than 231 polarisation states (the pol_xel column, which is supposed to be “adql:BIGINT”), and I do not feel overly future-skeptic when I say that it will also be some time until we have images with a linear dimension of more than two billion. There is also no good reason to have an order-of-magnitude value like em_res_power to 16 significant digits (as implied by “adql:DOUBLE”)[2].

    I have cleaned this up in DaCHS 2.12.2. With this, the types of Obscore and the corresponding columns in SSA and SIAP are consistent within DaCHS' metadata declarations.

    However, the on-disk tables will keep their original types regardless of what DaCHS claims they are. You could fix this by re-importing the tables, but that would take quite a while, at least in my case. I have hence opted for targeted updates.

    The first step in that procedure is to figure out where Postgres' ideas of columns are now different from DaCHS' ideas given the recent metadata updates. For that, dachs val has had the -c (or --compare-db) flag for a long time. Running:

    dachs val -vc ALL
    

    gives you a list of all RDs that need work because the on-disk types (which actually determine the query plan) differ from DaCHS' expectations (which will fix the UNION ALL trouble). Once they match, you can feel entitled to a good query plan.

    Based on this, I have incrementally built a fixing script on my development system. As I'm pointing out towards the end of Publishing a Service in the DaCHS tutorial, the recommended way to run a DaCHS-based data centre is to have test snippets of almost all the resources on the production system on a <cough> development system (presumably: your laptop). That's what I do, and in this way I built this script:

    import subprocess
    
    from gavo import api
    
    with api.getWritableAdminConn() as conn:
            conn.execute("DROP VIEW IF EXISTS ivoa.obscore")
            conn.execute("DROP VIEW IF EXISTS dasch.plates")
    
            for table_name in ["emi.main", "dasch.narrow_plates"]:
                    conn.execute(f"ALTER TABLE {table_name} ALTER t_xel TYPE integer")
    
            for table_name in [
                            "emi.main", "dasch.narrow_plates", "ppakm31.cubes", "applause.main",]:
                    conn.execute(f"ALTER TABLE {table_name} ALTER s_xel1 TYPE integer")
                    conn.execute(f"ALTER TABLE {table_name} ALTER s_xel2 TYPE integer")
    
            for table_name in [
                            "emi.main",
                            "dasch.narrow_plates"]:
                    conn.execute(f"ALTER TABLE {table_name} ALTER pol_xel TYPE integer")
    
            for table_name in [
                            "emi.main",
                            "dasch.narrow_plates",
                            "califadr3.cubes"]:
                    conn.execute(f"ALTER TABLE {table_name} ALTER em_xel TYPE integer")
                    conn.execute(f"ALTER TABLE {table_name} ALTER em_res_power TYPE real")
    
            for table_name in [
                            "emi.main",
                            "dasch.narrow_plates",
                            "ppakm31.cubes",
                            "applause.main",
                            "califadr3.cubes"]:
                    conn.execute(f"ALTER TABLE {table_name} ALTER em_min TYPE real")
                    conn.execute(f"ALTER TABLE {table_name} ALTER em_max TYPE real")
    
            for table_name in [
                            "emi.main",
                            "dasch.narrow_plates",
                            "applause.main"]:
                    conn.execute(f"ALTER TABLE {table_name} ALTER s_resolution TYPE real")
                    conn.execute(f"ALTER TABLE {table_name} ALTER s_pixel_scale TYPE real")
                    conn.execute(f"ALTER TABLE {table_name} ALTER s_fov TYPE real")
    
    
    for rd_id in ["emi/q", "califa/q3", "rome/q", "dasch/q", "ppakm31/q"]:
            subprocess.call(["dachs", "imp", "-m", rd_id])
    
    subprocess.call(["dachs", "imp", "dasch/q", "make-view"])
    subprocess.call(["dachs", "imp", "//obscore"])
    

    As I said: which columns to fix I learned from dachs val -vc; the extra DaCHS operations were necessary because Postgres refused the type changes as long as the views were still defined.

    Success?

    This entire operation has made quite a few obscore queries a lot faster.

    Regrettably, the motivating query, viz.,:

    select count(*)
    from ivoa.obscore
    natural join ivoa.obs_radio
    

    is still slow. I have dug a bit into why Postgres does not find the seemingly obvious plan of just materialising the join with the tiny obs_radio table and contented myself with the note that has been in section 9.21 of the postgres documentation forever:

    Users accustomed to working with other SQL database management systems might be disappointed by the performance of the count aggregate when it is applied to the entire table. A query like:

    SELECT count(*) FROM sometable;
    

    will require effort proportional to the size of the table: PostgreSQL will need to scan either the entire table or the entirety of an index that includes all rows in the table.

    But at least a query like:

    select dataproduct_type, access_url, t_min, t_max
    from ivoa.obscore
    natural join ivoa.obs_radio
    where t_min between 56000 and 56005
    

    is fast, and until further trouble that's good enough for me.

    Followup (2026-03-03)

    Well, futher trouble came afoot, and with DaCHS 2.12.3 you can therefore materialise your obscore table. This is as simple as saying:

    materialiseObscore: True
    

    in the [ivoa] section of your /etc/gavo.rc and then saying:

    dachs imp //obscore
    dachs limits //obscore
    

    in a shell. For large obscore tables, this will take a while (about 30 minutes for the imp in my data centre). I don't intend to do that more than once a month on average, and while queries to ivoa.obscore will block in that time, I think it's worth it: Query plans and all become a lot more readable, and my count(*) query suddenly finishes in less than a second. That's a big win over the several minutes I had before.

    Well: At least I have learned quite a bit about UNION ALL, and also about gathering metadata from many RDs at a time. So, this whole investigation was not a total waste of time.

    And if you have to know: this is not actually a materialised view but rather a normal, full-fledged table. That is because you cannot drop tables that are part of a materialised view, whereas once their rows are in a table, postgres lets you drop them as you like. And dropping is important if you want to develop your data collections.

    [1]In case you wonder: the individual parts of this union are kept in a table ivoa._obscoresources that you can inspect and even manipulate for special effects. The management of that table is among there more complex things one can do in DaCHS RDs. If you are curious, dachs adm dump //obscore will show you all the magic.
    [2]I put these type names into quotation marks because they were never formally defined. What Obscore does there has been identified as an antipattern in the meantime; newer specifications of similar schemas only distinguish floating point, integral, and string types and leave the choice of lengths to the implementations. If I may say so myself, I like the considerations on types within section 8 of RegTAP.
  • Out But Not Down

    A business phone with many custom buttons on a moderately cluttered desk

    Well, at least Uni Heidelberg still lets in calls to the phone on my desk. For connections to our data centre's servers, even after five days: no signal.

    Yesterday morning my phone rang. It was a call from Italy, and it was a complaint that my registry service was terribly loaded and didn't respond in time. That struck me as fairly odd, because I had just used it a few minutes before and it felt particularly snappy.

    A few keystrokes showed that was because it was entirely unloaded. A few more keystrokes showed that was because the University lets all incoming connections starve. They did that for all hosts within the networks of the University of Heidelberg, in particular also for their own web server. No advance warning, nothing. I still have no explanation, only rumours that they may have lost their entire Kerberos^WActive Directory. Even if that were true, I can't really see why they would kill all data services in their network: that's hashed passwords in there, no?

    So, while we're up, to the rest of the world it seems we're terribly down. This is also the longest downtime we've ever had, longer even than during the diskocalypse of 2017.

    I also have no indication when they plan to restore network connectivity. Apologies, and also apologies that they don't even send an honest connection refused and hence your clients are going to hang until there is a timeout.

    Meanwhile, our registry service at reg.g-vo.org keeps working; this is a good opportunity to thank my colleagues in Paris and Potsdam for running backup services for that critical piece of infrastructure.

    Followup (2025-11-21)

    Going into the weekend, there is still no communication from the computation centre on a timeframe to get us back online. At least they sent around a mail to all employees urging them to change their passwords; I am thus inclined to believe that they lost the content of their user database, and given they use these passwords in all kinds of contexts, I could well imagine they were stored using what's called “Reversible Encryption” in Windowsese. If that's true, they are hosed, but that is no excuse for killing my services.

    Followup (2025-11-24)

    Still no news from the University and its “CISO” on when we might get back connectivity. I consider this beyond embarrassing and thus helped myself. While the minor services (telco.g-vo.org, www.g-vo.org, docs.g-vo.org and so on) are still unreachable and still will hang until a timeout (what an unneccessary additional annoyance!), dc.g-vo.org should be back, at least to some extent.

    To pull this off, I went to Hetzner and clicked myself a minimal machine (funnily enough, it's phyiscally located in Helsinki). I then configured the sidedoor Debian package to enable connect to root on that new server (this is a bit tricky; you have to manage the files in /etc/sidedoor manually, including key generation; I ended up pulling the known_hosts entry out of my own ~/.ssh/known_hosts).

    And then you just run your equivalent of:

    sidedoor -R "*:80:dc.zah.uni-heidelberg.de:80" -R "*:443:dc.zah.uni-heidelberg.de:443" root@uhd-kruecke
    

    Regrettably, it needs to be root because of the privileged ports involved.

    So, we should be back in the VO. Please let me know if you disagree.

    Followup (2025-11-24)

    Uh, it seems I was not quite clear in the last update. The main message simply is: You should see dc.g-vo.org and its services normally now.

    All the talk about sidedoor and ssh tunnels was just an illustration of how I fixed the network outage. I was so specific partly to help others in the same situation, partly so the computation centre can't say they didn't know what I was up to.

    Followup (2025-11-28)

    If you speak German, there is a fan page for this entire disaster on the aptly-named page urz.wtf.

    Followup (2025-12-03)

    Two weeks into the disaster, there is the first official communication from the responsible persons to the service providers they cut off. In their denial of large-scale breakage and hermetic murmur about secrecy, the feeble words frankly remind me of Brezhnev-era bulletins, except back then they did not use stock illustrations supposed to illustrate… confusion?

    A question and exclamation mark each in a blue circle, centered between German text.

    I have to say that I am fairly angry with a statement like:

    These ongoing measures [taking everyone offline] proved to be proportionate and effective. [Diese Schritte, deren Umsetzung noch andauert, haben sich als angemessen und effektiv erwiesen.]

    Proportionate!? Shutting off services that have absolutely nothing to do with whatever was compromised for two weeks?

    There is the apt German phrase of „Arroganz der Macht“ (“conceit of the powerful”). Seeing that URZ not only not deigned to give any reaction to the distress signals that not only I have sent them in these past two weeks but clearly completely and utterly ignores them: I can't deny that that is infurating.

    Good disaster management means being transparent and showing some humility, ideally apologising to those that had a hard time because of the accident you had (or, in this case more likely, caused). The URZ does the opposite, pointing in all other directions:

    The computation centre has established a task force and closely works with the responsible agencies [...police, domestic intelligence, “cyber security agency Baden-Württemberg”]. [Das Universitätsrechenzentrum hat einen Krisenstab eingerichtet und arbeitet derzeit sehr eng mit den zuständigen Landesbehörden, insbesondere mit dem Landeskriminalamt Baden-Württemberg unter der Sachleitung der Generalstaatsanwaltschaft Karlsruhe, dem Landesamt für Verfassungsschutz, der Cybersicherheitsagentur Baden-Württemberg sowie dem Landesdatenschutzbeauftragten und der Hochschulföderation bwInfoSec, zusammen.]

    Dear URZ: If you are running Active Directory with “reversible encryption“ (and no, I don't know whether that's what they did[1], but it certainly seems like it), you're juggling with chainsaws, and nobody can help you, least of all the domestic intelligence service.

    At least we are given some perspective:

    The services will now, after a diligent examination and after establishing extra protective measurements, step by step, prospectively in the middle of the coming week, i.e., Wednesday Dec 10 2025, again be available on the internet without VPN. This only applies to services complying with the necessary security standards. [Die Dienste werden jetzt, nach sorgfältiger Prüfung und nach der Etablierung von zusätzlichen Schutzmaßnahmen, Schritt für Schritt voraussichtlich bis Mitte der kommenden Woche, d.h. Mittwoch, den 10. Dezember 2025, wieder über Internet ohne VPN verfügbar sein. Dies gilt nur für Dienste, die die nötigen Sicherheitsstandards erfüllen.]

    That's a downtime of three weeks (well, would be if I hadn't established workarounds for the most important services), a large multiple of the combined downtimes I had due to all the mishaps in 15 years of running a data centre on a shoestring budget. It is hard to imagine an attack that causes worse damage.

    And I shudder to imagine what “necessary security standards” might be unleashed on us.

    Sorry for venting. But it's really not nice to be on the receiving end of an entirely botched crisis reaction.

    Followup (2025-12-19)

    Over the last few weeks, I have brought back all of GAVO's services, including the version control system with our sample RDs, the web page with the various educational materials, and our Debian repository, through our reverse ssh tunnels. It's a fragile mess, but I think by and large I have successfully mitigated the attack on our services by… who knows; there's still no reliable statement who has decided to take everyone offline.

    Of course, nothing happened with the block against our services on Dec 10th (see the university's announcement above). Even nine days later, I've still not been officially informed who I'd have to petition and beg to get my job done without a metric ton of kludges.

    I was almost content with the fragile mess when today I received the official press thingy of the University, Unispiegel Digital, more precisely the December issue. And that came with another statement that is so outlandishly bizarre that I just can't keep quiet. They say:

    Auf die Universität Heidelberg ist ein großangelegter Cyberangriff mit dem Ziel vorbereitet worden, die gesamten universitären IT-Dienste zum Erliegen zu bringen. Dieser Angriff konnte identifiziert und rechtzeitig abgewehrt werden.

    It seems there is no official translation, so let me paraphrase: “A large-scale cyber attack on Heidelberg University has been prepared with the goal of shutting down all the university's IT services. This attack could be identified and thwarted in time.”

    Call me negative an illoyal, but… what can I say? You see, all of GAVO's IT services would have been shut down for four week by now without my emergency hacks. How this could possibly pass as successful defence of an attack with the goal of shutting down these said services is completely and utterly beyond me.

    Dear university officials: Since I have been unable to otherwise contact the responsible persons, let me state again here that I understand that shit happens and you can lose all your employees' credentials. But then own up to it, don't clam up, and work with the victims of your failure to get operations back up. Completely counterfactual pie-in-the-sky declarations, on the other hand, will certainly not help to calm the waters.

    Followup (2026-02-05)

    Believe it or not, we'd still be offline without our mitigations. If you can read German, you may enjoy the main communication I've received about the whole disaster so far, an interview in the Unispiegel with the University's “CIO“ Heuveline that repeats the story about an “attack that is larger and more dangerous than everything detected so far“ and then goes on to speculate that this was ransomware (“such attacks mostly aim to encrypt data in order to paralyse and blackmail the institution”). Larger and more dangerous my ass. Ransomware!

    Also Heuveline says that “we still have to close 30 to 40 cases”, where a case is a service they've blown. Ha! That would make me and my various services about 10% of their remaining workload, which sounds extremely unlikely to me.

    Anyway, at least Heuveline concedes that it would be a mistake to “turn the University into a fortress”. Good to hear that after more than two months of having to hobble on with ssh tunnels.

    Which brings me to my main topic: For some reason I have been unable to fathom so far, the sidedoor tunnels through which I keep my infrastructure open sometimes get “clogged”: The look open, but no data moves.

    My icinga duly notices that, but until I read the mail, there is significant downtime. Hence, I have written a quick hack that tells the sidedoors to rebuild the tunnels immediately (by sending a USR1 signal) when it seems the box is unreachable. It looks like this, and I'll run it in the foreground of a screen session, hoping I will get some better insight of what might cause the clogging:

    #!/bin/sh
    WATCH_URL=https://dc.g-vo.org/.well-known/security.txt
    while true; do
            curl -s -S --connect-timeout 10 "$WATCH_URL" > /dev/null
            if [ $? -eq 28 ]; then
                    # it's a timeout
                    logger "Found dc.g-vo.org hanging, USR1-ing all sidedoors"
                    killall -USR1 sidedoor
                    date
            fi
            sleep 120
    done
    

    Yes, I am piling hack upon hack. But there is a perspective for one day moving my VO services out of Fortress Heidelberg and hence get rid of all the ugly hackery again. It's just such a bad shame that that seems to be necessary.

    [1]I don't know that because URZ, against all sane policies, still doesn't confess up and instead murmurs “further information cannot be transmitted while investigations are going on [Weitere Informationen können während der laufenden Ermittlungen derzeit nicht übermittelt werden].” I'm sorry, but if I had to write a book on what not to do if you've been compromised, I'd include exactly that sentence, including the awkward „übermittelt“.

Page 1 / 23 »