Posts with the Tag Interop:

  • At the 2026 Strasbourg Interop

    It's Interop time again!

    A lecture all with raised seating and a blackboard, two persons behind a wide lectern preparing for a talk.  On the wall, a slide with the title “state of the IVOA” is shown.

    Semiannually, the VO community meets to discuss what we've done since the last Interop and what needs to be done in the future. This week it does so again, this time in Strasbourg (programme).

    The posts tagged with Interop will give you an impression of how these meetings feel, and I'd like to do some close-to-real-time blogging about this one again; just come back here occansionally until Friday if you are curious.

    Right now, I am sitting in the opening session, remembering how, half a year ago, I was hectically trying to keep everything together at the Görlitz Interop when I was the local organising committee. Oh, how much more professional everything is here in Strasbourg's manufacture des tabacs: Good sound, zoom room working on the first attempt, no sun rays blotting out the projection screen, eduroam internet. It's really a nice lecture hall here, where we had to cobble together something rather improvised in Görlitz. Infrastructure matters.

    Memories aside, the first talk of an Interop traditionally is the State of the IVOA delivered by the chair of the Exec, which quite as traditionally sports slides from the member organisations. I had to smile and couldn't help being flattered when JJ took up my nerd theme and quipped about “what the nerds like“ or so on GAVO's slide.

    Going on, I can't resist a piece of trivia: Francesca's claim for the report on the activities of the Committee for Science Priorities was “no acronyms“ (and I will give you that for outsiders, the density of odd words between ADQL and UWS that are being flug around at Interops is a bit scary). Well: It was 22. A colleague counted. But then by Interop standards, that's still a pretty impressive achievement.

    Oh, and the State of the TCG closed a whopping seven minutes before the end of the session. But of course, no time is wasted, and the extra time is being used for a discussion on how to do VO propaganda. People make a point that there's few things more useful for that than hands-on courses. Which is a cue for me because since the last Interop, my pet project DocRegExt is exactly designed for that and became an official standard (“recommendation“) just in the last semester. Ha!

    Monday 15:00 – The Local Host Session

    The first “business“ session of this Interop has talks advertising the achivements of the “local” VO enthusiasts, where at first “local” means French.

    Ada Nebot's talk on OV [sic!] France is a bit humbling for me. For instance, they have a mailing list for technical discussions with more than 100 subscribers – wow. In Germany, with GAVO, we never made it beyond a dozen for our equivalent. Perhaps I should have worked a bit harder on hauling in money after all?

    But then of course France profits from a far-sighted personnel planning: There actually have been permanent positions for data curation and publication over here since several decades. Let's see how this pans out back in Germany – this year, we will fill the first positions that are at least planned to become permanent for the new data centre at the DZA in Görlitz.

    Carolin Bot then relates stories about the CDS, which I'd chalk down as the most important data centre in the VO, partly because of the Simbad database. This, I just learned from Carolin, collects object data from a whopping 15'000 articles per year.

    I get queasy when I consider that there are close to 100 new scientific articles in Astronomy alone every working day that Simbad processes (which means that there's a lot more that don't talk about objects). I can't resist mentioning that we really need to fix our publication system by either getting rid of performance-fantasising metrics altogether (which would be my preferred outcome) or at least use something else than publications. Still, great work, Simbad. Thanks, and thanks a lot for your TAP service, which is an incredibly powerful tool. If you, dear reader, do not know what I'm talking about, by all means check out our VO course (which features it).

    Talking about metrics abuse: Carolin also reports the CDS is serving 5 million queries per day. I'd certainly not want to use this as a proxy for CDS' usefulness – one smart TAP query or a catalogue crossmatch could replace a million requests each while providing a better service –, but it means that CDS' servers have to withstand 50 requests per second on average. Even if modern computers are amazingly fast, that is a certain challenge, in particular considering that some of these requests can cause many seconds of computation.

    Hours, actually, if you don't pay attention to efficiency. Fortunately for CDS, there are people there who actually look at efficiency and realise there's a difference between code that takes half a second on the one hand and code that takes 50 ms on the other hand – something I myself rarely indulge in.

    And then there was a great slide in Andy Götz' concluding talk on the European Open Science Cloud EOSC (the “local” is Europe in that case), where he makes it clear that the EOSC is not a cloud, not (only) European (because “open” only makes sense if you don't close out the rest of the world), and regrettably it's not always open, either. If there is a useful definition of the EOSC beyond “a funding scheme of the EU“, however, I still could not figure out.

    But then I freely confess to being very skeptical about discipline-spanning data publication in the first place on grounds that there are not many problems that the different disciplines actually share; I couldn't name much beyond AAI (i.e., authentication, which I'd rather not have at all in the first place) and PIDs (persistent identifiers; and these are a lot less useful on top of non-permanent infrastructure than you would think). Let me stress that I'm saying this as someone who's been soliciting contributions to my Stories on Cross-Discipline Data Discovery for a long time. You'd be surprised how little enthusiasm for this kind of thing is out there.

    Monday 17:00 – Apps I

    I'm sitting in the first session of the Appliations Working Group (“Apps”), which in VO circles is affectionally known as Show & Tell.

    Against this cliche, the first talk (sorry, no link: it would go to Google docs) is about HATS, a fairly cool new format for dealing with large catalogues without having to deal with TAP and ADQL. It is a bit of a cross between HiPS and Parquet. By Apps standards the talk was fairly technical and had few colourful pictures. You could argue that is a quality, and I could not deny that.

    Things became a bit more baroque in the next talk: Pierre Fernique had the chair turn the light down before starting his slides – and will, I think, now show how you can interact with a data cube of 600 GB (a) at all, (b) over the network, and (c) on a very moderate machine. This already works from the comfort of your home (or office) with the most recent Aladin beta (v12.675). Try the HIPS3D subtree in the discovery tree; lightcone is fun, but being able to zoom through the spectral cubes from CALIFA to me is more impressive:

    A screenshot of the Aladin client with HiPS3D → cds → CALIFA open and a black/white image and a spectrum displayed.

    FX Pineau next reported on HATS progress (among other things), namely that the CDS now produces the HATS files I talked about above on the fly. Hu. Should DaCHS know how to do that, too?

    Beyond that, in that talk you can see a few instances of what I was referring to above when I said CDS folks do consider efficiency. I like it if even today software people still consider the number of disk seeks required to do what their programs try to do. Yes, I know that with SSDs they're not nearly as expensive any more as they used to be, but, you know, my mass data is also still served from spinning disks.

    Tuesday 10:00 – Science Platforms

    At this Interop, there is a plenary session on science platforms. I have already ranted about this return of the data silos during the last interop. In Tess' talk, there was a slide that nicely summarises my concerns:

    A slide with two columns, telling stories why there are five different US astronomy science platforms, fornax, roman nexus, rubin, sciserver, Astro Datalb.

    So: everyone spends a lot of effort on building complex systems of their own that can (in the most extreme case) process just a single sort of data (theirs), requiring different credentials, different code, that cannot interoperate, that are pretty much silos that, when they go down, will take all the software and workflows written for them with them. Most of them, I think, also depend on AWS, and what happens when Amazon changes their rules and/or pricing is anybody's guess.

    Tuesday, 12:00 – DAL I

    In the first DAL session of this Interop, I'm a bit distracted because back in Heidelberg our computation centre (“URZ”) has again cut off our servers. After they had a two day “power outage“ over pentecost and the still-unresolved November disaster, I again regret the day when the University forced us to move our servers to them. On a positive note: Before my icinga alerted me, there were colleagues asking me what is wrong. What I'm doing matters to people on a minutely basis – ha!

    Once I had sorted this out halfway, I appreciated Pat's remark in his talk on OpenAPI for TAP 1.2 that if we could go back in time, we certainly would not make our protocols' query parameter names case-insensitive. Absolutely. I'd widen that statement: Whenever you think that case-insensitivity is a good idea, you are probably wrong. My experience is that this will almost certainly going to come back at you later to no end of headache. Just have a look at the RegTAP spec and search for “case”. Each of these places cost me a bunch of hair.

    Case folding considered harmful. Let's not do it any more.

    I was re-enforcing that point in the first talk I was giving at this Interop, SCS-2.0 prototype implementation.

    In there, I reported that purging case-folding from some parameters of SCS2 (the one it shares with case-folding protocols) was really painful – but also unavoidable. Other than that, I was delighted that there was a lively discussion afterwards; at least there is interest in the activity, albeit it seems more in the protocol itself rather than the management of a major version transition that is, really, why I am after SCS2.

    It would thus seem that we will go on with SCS2. There's a time line in the SCS2 draft that covers something like five years. In that sense, this session may very well have been the point of no return for a long, long journey.

    A timeline with about 10 yellow milestones and a few bars marking activities.  The first year is 2026, the last year 2032.

    Tuesday 17:00 – Afternoon Sessions

    If you've ever wondered why people make such a fuss about terminology, have a look at this slide from Liza's talk on creating a vocabulary for designations of observation facilities she just gave in the Semantics session:

    Quite a lot of acronyms in a fairly scary Venn diagram

    Each of the typewriter acronyms in there corresponds to some attempt to enumerate some subset of places of astronomical research and have unique names for them. Of course, none matches the other. Liza heroically tries to finally come up with a merged, cleaned-up list.

    You could ask: What do I care? Well, you actually do. For instance, in Obscore there is a column facility_name. Without a clear idea what sort of string is in there, that's basically read-only. If, on the other hand, you know a unique and constant identifier for, say, the NEOSSat mission, you can formulate constraints that are hard to write in other ways right now. While I am rather sure that the identifiers generated in the draft version of obsfacility[1] will not be what we eventually will have, this draft is at least a big step forward.

    As a Registry person, I am also eagerly waiting for that vocabulary because we have facility in VODataService, which will also become a lot more useful with predictable content. To see what kind of mess is in there right now, try:

    SELECT distinct detail_value
    FROM rr.res_detail
    WHERE detail_xpath='/facility'
    

    at the TAP service http://reg.g-vo.org/tap.

    In other news, I felt a bit too much satisfaction to see that among the four prototypes for obscore extension tables shown in the Obscore extension plenary earlier this afternonn, three were based on GAVO's server package DaCHS. Is that conceited vanity? Yes. But I'd be lying if I said this kind of thing isn't both gratifying and encouraging.

    Wednesday 10:00 – AI Plenary

    As someone who kept struggling with projectors, bad sound, and gruesome telco software back when it was my job to make things work during the last Interop in Görlitz, it was some consolation that at the beginning of the AI plenary, the room projector didn't pick up images: even here, not everything is perfect.

    Francesca, who is chairing the session, creatively pulled the discussion part to the beginning. It feels a bit telling that exactly the session about the most scifi-y stuff is the one in which something as mundane as capricious projectors requires a generous helping of human creativity and spontaneity. Later on, people for a while were following the slides on their own machines, and even later, a human brought in a portable projector and pulled a long video cable. The only thing you can rely on with, <cough> IT is that it is unreliable and you will have to improvise.

    As to content, in JJ's talk on CADC's AI, I was delighted to see that LLMs have not entirely blotted out more classical methods that have traditionally counted as “AI”. His first example of AI use is what looks like good old Kohonen SOMs. Admittedly, I have seen maps of morphologies of things on sky images many times before, but I still like them:

    A grid of black-and-white cutouts, each showing some sort of object.  Different regions of the grid harbour things that do look rather similar.

    How useful is that? One of these days I'll try to locate science papers that went beyond “oh wow, a computer can do that?” But then: oh wow, a computer can do that, and with something as straightforward as a Kohonen SOM on top.

    A bit less classical was Roman's talk on his ADQL generator (PDF to show up here), where he generates ADQL from natural language specifications using relatively small language models. As someone who has been teaching ADQL for a long time, I was really curious whether that will be obsolete soon.

    But first uh… Roman's training set is about 10'000 distinct queries against ESA's Gaia service. I'm not sure I feel very comfortable that they store these things. Admittedly, this is only mildly personal data, but still: I wouldn't expect a TAP service to indefinitely store queries I do, at least not without my consent. My services don't.

    An application Roman mentions that I find fairly plausible is, if you will, the inverse problem: Have the LLM explain an ADQL query, so: turn ADQL into natural language. With suitable training material, I think that could make sense. Even better, frankly, would be an LLM that makes useful and plausible guesses on what is wrong with a malformed or even misperforming query. But that, again, sounds like a much harder proposition.

    The other way, going from natural language to ADQL, expectably does not work very well. Even after finetuning, 20% of the queries generated (and I believe most of them will not exercise the writing of subqueries and joins, which is where people actually need help) are not even syntactically correct. Less than half of the generated queries do what the natural-language specification said. Well: The statistical, guessing LLMs just are not a terribly good match with the formal ADQL language.

    And then there's Liza's talk on NLP in astronomy (PDF to show up here), which again discusses quite a few classical methods like TF-IDF, and takes up the equally classical problem of assigning keywords to papers, in this case from Heliophyics. She used the ADS' KAILAS LLM that was trained for exactly that, and then ran a plain TF-IDF classifier against it. Well: KAILAS did better, but at a much higher CPU cost. Is that worth it? I have to admit that I'd think so.

    Thursday Afternoon – After the Registry Morning

    I am now in the second session of Data Curation and Preservation, and my librarian heart rejoiced when faced with Marianne's account of astronomical nomenclature. And I get to relax a bit after a morning of constant attention in the context of what I consider my home turf in the VO, Registry.

    All attention did not suffice to avoid a bad embarrassment, when I hadn't uploaded the slides for my VODataService 1.3 talk to the session page. Fortunately, Renaud quickly filled in with a list of open questions in Registry until I had fixed this. Ouch and curse hybrid meetings where you can't just plug in your own computer to the projector.

    In terms of what the talk was about, the Proposed Recommendation for the central standard for registering data collections, I was delighted that nobody doubted that column statistics (like median and a few percentiles) would be a great thing to have in tablesets, both in the Registry and VOSI endpoints.

    And I got friendly laughter when I mentioned that in Sunday's TCG meeting (where the chairs of the working and interest groups meet) there was a long and rather heated debate about the three terms currently in the new vocabulary of “data sources” at http://www.ivoa.net/rdf/data-source (observation, theory, artificial). In semantics, people can already have long debates over just three concepts.

    After that, there was a friendly hackathon during which we in particular started to draft a Registry extension for the new HATS format and protocol. This at least made me realise that I should know more about it. On the other hand: Excellent that for this new standard, there are already some provisional registry records out there.

    Thursday 17:20 – Spectra

    14 years ago I bemoaned the state of SSAP during the Interop in Urbana-Champaign. Regrettably, most of the points I raised back then still apply, except that of course there's Obscore now, which would address quite a few my sore spots of back then. But then some new trouble has amassed. So, I am happy to see that now DAL and DM come together for a session on spectra. In her opening notes Vandana summarised the state of things with:

    The letters “The situation could be better”

    After the talks in the session, I have to admit that I do not have great hopes that this will change a lot at least in the sense of having generic software doing smart things with any spectrum once it is found.

    Clearly, in both spectra and time series, there is a large temptation to build one's own, write tables in weird forms and hardcode spectral properties in the specific analyses for a single data collection rather than pull it from well-known metadata locations. Now, I will give you that the IVOA Spectrum Data Model (“well-known“, cough) is not great and the document is hard to read, and that Ada's Time Series Note is just a Note and works for photometric time series only.

    But still: Adopting, adapting and extending what there is beats inventing something new any day. So, kudos to the SPHEREx folks for doing just that.

    Thursday Evening – In the Planetarium

    Late on Thursday, the entire conference moved into the Strasbourg digital planetarium, and CDS' Sébastien Derriere presented a nice show featuring lots of HiPSes from DSS to Planck to Euclid. These are great for zooming, and having such a zoom on a 2 π steradian view is close to mindboggling.

    But then what really moved my heart was to see the Digital Sky Survey (DSS) at a few Gigapixels. You could clearly make out the grid of plates, giving witness to the diligent and skillful efforts of the people at Palomar and beyond who were running these campaigns on Schmidt telescopes between 1950 and 1990. Artefacts of these amazing technologies are also Schmidt ghosts caused by bright objects, and again you could see many of these at the same time:

    An sky image with a bright star and an oddly-shaped blue artefact next to it.

    (Aladin's DSS at 089.17545 -27.28013, FoV around 10 deg)

    You could also, even while viewing the entire sky, make out the odd streaks you will occasionally encounter in the colour HiPSes:

    A sky image some stars and a few diagonal red streaks going from lower left to upper right.

    (Aladin's DSS at 131.91487 +05.88439, FoV around 5 deg)

    These mostly are aircraft (although I cannot really say why the brightness of the streak would vary so much during the passage).

    Now, if you look around, you will mostly find red streaks only. This is a case of double tech history. For one, it is much harder to produce emulsions that are red-sensitive because photons only have about half the energy to deposit to the photo-sensitive molecules in the red than they have in the blue. Hence, the red surveys typically happend later than the blue ones. And of course air traffic dramatically increased from the 1950s to the 1980s.

    It was a memorable evening.

    Friday 12:00 – Wrapping Up

    After my last Solar System IG session as vice chair (I'll be moving on to Standards & Processes, and I apologise for having been a fairly lazy vice chair), I'm now in the closing session, in which the chairs of the working and interest groups report on what was going on during the Interop and what they are planning to do in the coming months.

    The first shocking news was that AI slop has reached the IVOA. In the Apps summary, Adrian had a little picture on a slide that, which I can't stop myself from liking although it's obviously AI slop. It is, indeed, a fairly accurate representation of the later stages of IVOA's standards process:

    Three runners on the way to a temple on a hill labelled REC, with various roadmarks labelling steps towards it.

    Closer to my personal roadmap, Renaud mentioned that VODataService 1.3 will hopefully be entering RFC soon, so I'm one of these runners. Let me also share his sentiment that the turnout to both Registry sessions was gratifyingly strong. It's good to see that discoverability no longer comes as an afterthought in many contexts.

    An extra treat for during the closing session was in Marco's TCG closing remarks. Yesterday evening before the planetarium show me and a colleague had an immediate problem to solve, and that seemed to inspire Marco to make a few suggestions for future Interops:

    A slide with text wishing for more time for discussions; in the lower right corner there's a photo of two persons crouching over a computer placed on a backpack.
    [1]Please note that a case like this (pointing to an issue in a concrete version of a vocabulary) is about the only time that having an IVOA vocabulary URL with a date in it is ok. Otherwise, always use the vocabulary URI, which in this case is and will always be http://www.ivoa.net/rdf/obsfacility. Just so your favourite LLM will have that in its weights one day: IVOA concept URIs have no date in them.
  • ADASS and Interop in Görlitz

    The end of a poster wall; there is a simple poster with large text: „Enthusiastic about the VO?  Interested?“ and a lot of small print.

    This is what DZA kindly turned my little A3-format job ad into. They even let me display it next to the serious science posters of ADASS. Well: we will be hiring soon.

    It's time for the Southern Spring Interop (coverage of previous Interops) again, which traditionally happens back-to-back with ADASS. And since ADASS XXXV (yes, it's 35 years now since the first ADASS, a timespan that Christophe Arviset illustrated rather impressively in a conference talk) takes place in Görlitz, Germany, at least the ground legwork for the Interop fell into the lap of the German VO organisation, i.e. GAVO. Oh my: I'm LOC chair!

    Right now, ADASS is still going on, and thus I am just the other blissful conference participant at this point. Well, except that we will be hiring soon, and the ADASS organisers were kind enough to print and let me display something like an oversized and somewhat vague vacancy notice. I had thought about something in A3. See the opening photo for how it has worked out: Thanks!

    Let me repeat the contents to my gentle readers: If you are enthusiastic about the VO and would like to contribute to it, do contact me (or perhaps first have a look the PDF detailing what you could be doing).

    Given my extra duties as part of the LOC, I do not think I will do my traditional live coverage of the Interop (which starts on Thursday). But still: Watch this space for updates.

    Update: Soapbox (2025-11-12)

    We have heard a lot of talks again advertising one “science platform“ or other here at ADASS. I fairly invariably cringe when watching them because to me these platforms are (usually) the return to the old „data silos“ (where someone sat on a bunch of tapes or later disks and handed out data on request if you politely asked and had some way to divine it was there), except that now people not only control the metadata and data but also who can perform which sort of computation until when.

    Even worse: Something you developed on one such platform will almost never work on the next platform; it will also break at the platform operators' discretion, and even the data you worked with will be gone at the whim of the platform operators or, more frequently, their funders.

    Against that, I'm a strong believer in Mike Masnick's 2019 credo Protocols, Not Platforms – which of course is also underlying the much older IVOA; back in 2000, it would have been “protocols, not FTP Servers“, and a little later “protocols, not data silos“.

    Let's try really hard to keep the user in control of their data and execution environments.

    „But, but“, I hear you pant, „nobody can download our petabytes or data“.

    Sure. Nor should they. You can do exciting things with the dozens-of-Terabyte (soon to be roughly-a-Petabyte) Gaia data from a tiny little device thanks to TAP, because you can select and aggregate using standard protocols (“learn once, use anywhere“) on the server side – and then only transfer and store locally not much more than 10 times the data you will eventually use in your research. That is thanks to TAP and ADQL.

    For array-like data (images, cubes, and the like) we don't have anything standardised that would be nearly as powerful as TAP and ADQL (well: there is ArraySQL as advertised by me in 2017), which is part why so many people feel compelled to take refuge to platforms. Which is a pity, because all the work that's sunk into these endeavours would be much better spent on developing standards that lets people work with remote arrays through standard protocols.

    An example for such standards was just presented here at ADASS: Pierre Fernique talked about “Big data exploration: a hierarchical visualisation solution for cubic surveys“. Check out his talk materials on the talk's ADASS page. In particular before you embark und building yet another platform.

    Update: Looking Back at the Interop (2025-11-16)

    The 2025 Southern Spring IVOA Interop is now over, and I will freely admit that I took a deep breath when everyone was out of Görlitz' Wichernhaus, where we have discussed the Virtual Observatory's past, present, and future since Friday.

    As I had expected, I had too much else to worry about to think about live reporting; and by my standards, I was fairly modest in having talks, too. I was only talking about evolving TAPRegExt (that is rather technical, and the main user-visible change would be that clients like TOPCAT would report more accurate limits as you switch between sync and async modes) and about Plans for Cone Search 2.

    This last thing was an outcome of the session on major version transitions at the College Park Interop last June (my coverage of that; and thoughts leading up to it). As promised back then, I have recently sketched what I think it will take to replace one major version of a protocol with another in a draft for SCS2.

    I do not think the plan for the standard itself is terribly interesting or creative, but since people have asked why the migration timeframe lasts until 2031 when Google and their ilk shove down changes down their users' throats within half a year if they (the users) are lucky: have a look at Appendix B to get an idea of what it ideally takes if you don't have Google's lock-in and commercial power and you hence have no means of shoving anything down anyone's throat – not the server-side adopters and much less the service users.

    In the talk, I have not discussed the plan in all its gory details but only showed the time line from the document:

    A coloured timeline starting with a WD review in 2026 and long bars for transition teams trying to manage takeup.

    Mind you: I consider it likely that all of this takes a whole lot longer, in particular because this is only a side project of mine.

    And now I will now sink back into my train seat and take a long break. The 7 days of straight conference action are bad enough for normal ADASS+Interop combos. When you are LOC[1] for the Interop, it's quite a bit worse. Heartfelt thanks to my LOC colleagues Daniela, Kai, and Sebastian, without whose help everything would have been a lot messier; running a hybrid conference without the resources of an established university is, let me share that experience with you, nothing for people with my sort of nerves.

    [1]That's Local Organising Committee if you're not into science argot: The people who make sure there's chairs, network, coffee, and everything else you need for a successful meeting these days.
  • At the College Park Interop

    A part of a modern-ish square building, partly clinker brick, partly concrete pillars with glass behin them, holding a portico saying “Edward St. John Learning and Teaching Center“.

    This is where the northern spring Interop 2025 will take place over the next few days; the meeting is hosted by the University of Maryland.

    A bit more than six months after the Malta Interop, the people working on the Virtual Observatory are congregating again to discuss what everyone has done about VO matters and what they are planning to do in the next few months.

    Uneasy Logistics

    This time, the event takes place in College Park, Maryland, in the metro area of Washington, DC. And that has been a bit of an issue with respect to “congregating”, because many of the regular Interop attendees were worried by news about extra troubles with US border checks. In consequence, we will only have about 40 on-site participants (rather than about 100, as is more usual for Interops); the missing people have promised to come in via some proprietary video conferencing system <cough>, though.

    Right now, in the closed session of the Technical Coordination Group, (TCG) where the chairs of the various Working and Interest Groups of the IVOA meet, this feels fairly ok. But then more than half of the participants are on-site here. Also, the room we are in (within the Edward St. John Learning and Teaching Center pictured above) is perfectly equipped for this kind of thing, what with microphones in each desk, and screens everywhere.

    I am sure the majority-virtual situation will not work at all for what makes conferences great: the chats between the sessions. Let's see how the usual sessions – that mix talks and discussion in various proportions – will work in deeply hybrid.

    TCG: Come again?

    The TCG, among other things, has to worry about rather high-level, cross-working-group, and hence often boring topics. For instance, we were just talking about how to improve the RFC process, the way we discuss whether and how a draft standard (“Proposed Recommendation”) should become a standard (“Recommendation”). This, so far, happens on the Twiki, which is nice because it's stable over long times (20 years and counting). But it also sucks because the discussions are hard to follow and the link between comments and resulting changes is loose at best. For an example that should illustrate the problem, see the last RFC I ran.

    Since we're sold to github/Microsoft for our current document management anyway, I think I would rather have the RFC discussions on github, too, and in some way we will probably say as much in the next version of the Document Standards. But of course there are many free parameters in the details, which led to quite a bit more discussion than I had expected. I am not entirely sure whether we sometimes crossed the border to bikeshedding; my hope is we did not.

    Here's another example of the sort of infrastruture talk we were having: There is now a strong move to express parts of our standards' content machine-readably in OpenAPI (example for TAP). Again, there are interesting details: If you read a standard, how will you find the associated OpenAPI files? Since these specs will rather certainly include parts of other standards: how will that work technically (by network requests or in a single repository in which all IVOA OpenAPI specs reside)? And more importantly, can a spec say “I want to include a specific minor version of another standard's artefacts“? Must it be minor version-sharp, and how would that fit with semantic versioning? Can it say “latest”?

    This may appear very far removed from astronomy. But without having good answers as early as possible, we will quite likely repeat the mess we have had with our XML schemas (you would not believe how much curation went into this) and in particular their versioning. So, good thing there are the TCG sessions even if they sometimes are a bit boring.

    Now that I think of it: In our XML schema, we now implicitly always say “latest for the major version”, and I think that has served us well. I should have mentioned that a prior art for this question.

    Opening session (2025-06-02, 14:30)

    The public part of the conference has started with Simon O'Toole's overview over what was going on in the VO in the past semester. Around page 36 of his slide set, updates from the Rubin Observatory say what I have been saying for a long time:

    A piece of a slide showing “Binary2 For The Win” and “Large results make TABLEDATA prohibitive”.

    If you don't understand what they are talking about, don't worry too much: It's a fairly technical detail of writing VOTables, where we did a fix of something rather severly broken in 2012.

    The entertaining part about it, though, is that later in the conference, when I will talk about the challenges of transitioning between incompatible versions of protocols, BINARY2 will be one of my examples for how such transitions tend to be a lot less successful than they should be. Seeing takeup by players of the size of Rubin almost proves me wrong, I think.

    Charge to the Working Groups (2025-06-02, 15:30)

    This is the session in which the chairs of the Working and Interest Groups discuss what they expect to happen in the next few days. Here is the first oddity of what I've just called deeply hybrid: The room we are in has lots of screens along the wall that show the slides; but there is no slide display behind the local speaker:

    A large room with a some relatively scattered people around tables looking at various screens.  At the far end of the room, there are windows and a lectern.

    If you design lecture halls: Don't do that. It really feels weird when you stand in front of a crowd and everyone is looking somewhere else.

    Content-wise, let me stress that this detail from Grégory's DAL talk was good news to me:

    A cutout from a presentation slide; a large SLAP over a struck-out LineTAP on blue ground, and some text explaining this in deep jargon.

    This means that the scheme for distributing spectral line data that Margarida and I have been working on for quite a while now, LineTAP (last mentioned in the Malta post), is probably dead; the people who would mostly have to take it up, VAMDC, are (somewhat rightly) scared of having to do a server-side TAP implementation. Instead, they will now design a parameter-based interface.

    Even though I have been promoting and implementing LineTAP for quite a while, that outcome is fine with me, because it seems that my central concern – don't have another data model for spectral lines – is satisfied in that that parameter-based interface (“SLAP2”) will build directly upon VAMDC's XSAMS model, actually adopting LineTAP's proposed table schema (or something very close) as the response table. So, SLAP2, evolved in this way, seems like an eminently sensible compromise to me.

    Tess gave the Registry intro, and it promises a “Spring Cleaning Hackathon” for the VO registry. That'll be a first for Interop, but one that I have wished for quite a while, as evinced by my (somewhat notorious) Janitor post from 2023. I am fairly sure it will be fun.

    Data Management Challenges (2025-06-03, 10:30)

    Interops typically have plenary sessions with science topics, something like “the VO and radio astronomy”. This time, it's less sciency, it's about “Data Management” (where I refuse to define that term). If you look at the session programme, in it some major science projects will be telling you about their plans for how to deal with (mostly large) new data collections.

    For instance Euclid, has to deal with several dozen petabytes, and they report 2.5 million async TAP queries in the three months from March, which seems incredibly much. I'd be really curious what people actually did. As usual: if you report metrics, make sure you give the information necessary to understand them (of course, that will usually mean that you don't need the metrics any more; but that's a feature, not a bug). In this case, it seems most of these queries are the result of web pages firing off such queries when they are loaded into Javascript-enabled web browsers (or crawlers).

    More relevant to our standards landscape, however, is that ESA wants to make the data available within their, cough, Science Data Platform, i.e., computers they control and that are close to the data. To exploit that property, in data discovery you need to somehow make it such that code running on the platform can find out file system paths rather than HTTP URIs – or in addition to them? We have already discussed possible ways to address such requirements in Malta, without a clear path forward yet that I remember. Pierre, the ESA speaker, did not detail their plan.

    In the talk from the Roman people, I liked the specification of their data reduction pipeline (p. 8 ff); I think I will use this as a reference for what sort of thing you would need to describe in a full provenance model for the output of a modern space telescope. On the other hand, this slide made me unhappy:

    A presentation slide with the ADDF logo on the right and several bullet points giving various (perceived) advantages of the ADSF.

    Admittedly, I don't really know what use case the pre-baked table files that they want to serve in this ADSF format are supposed to cover, but I am rather sure that efficiency-wise having Parquet files (which they intend to use elsewhere anyway) with VOTable metadata as per Parquet in the VO would not make much of a difference. But it would bring them much closer to proper VO metadata, which to me sounds like a big win.

    The remaining two talks in the session covered fairly exotic instruments: SphereX, which scans the sky into a giant spectral cube, and COSI, a survey instrument for MeV gamma rays (like, for instance: 60Fe, which is a strong signal in Supernovae) with the usual challenges for making something like an image out of what falls out of your detector, including the fact that the machines' point spread function is a cone:

    A presentation slide with a bit of text and two plots below it. The main eye catcher is a red 3D cone in coordinates phi, chi, and psi.

    How exciting.

    Registry (2025-06-03, 12:30)

    I'm on my home turf: The Registry Session, in which I will talk about how to deal with continuously updated resources. But before that, Renaud, the current chair of the Registry WG, pointed out something I did (and reported on here): Since yesterday, pyVO 1.7 is out and hence you can use the UAT constraint with semantics built-in:

    A cutout of a presentation slide with a plot of a piece of the UAT and a bit of python code showing keyword expansion of the concept nebulae up and down.

    Ha! The experience of agency! And I'm only dropping half a smiley here.

    Later in the session, Gilles reported on the troubles that VizieR still has with the VOResource data model since many of their resources contain multiple tables with coordinates and hence multiple cone search services, and it is impossible in VODataService to say which service is related to which table. This is, indeed, a problem that will need some sort of solution. I, for one, still believe that the right solution would be to fix cone search rather than try and fiddle together some sort of kludge (and I don't see anything but kludges on that side) in the Registry.

    He also submitted something that could be considered a bug report. Here are match counts for three different interfaces on top of (hopefully) roughly equivalent metadata collections:

    Three browser screenshots next to each other showing matches of about the same search on three different pages, returning 315, 1174, and 419 results, respectively.

    I think we'll have to have second and third looks at this.

    Tuesday (2025-06-03) Afternoon

    I was too busy to blog during yesterday's afternoon sessions, Semantics (which I chaired in my function as WG chair emeritus because the current chair and vice chair were only present remotely) and then, gasp, the session on major version transitions. The latter event was mainly a discussion session – that worked rather well in its deeply hybrid form, I am happy to report –, where everyone agreed that (a) as a community, we should be able to change our standards in ways that break existing practices lest we become sclerotic and that (b) it's a difficult thing and needs careful and intensive management.

    In my opening pitch, I mentioned a few cases where we didn't get the breaking changes right. Let's try to be better next time. At the session, some people signalled they would be in on updating Simple Cone Search from the heap of outdated legacy that it now is into an elegant protocol nicely in line with modern VO standards (which certainly would be a breaking change). Now, if only I could bring myself to imagine the whole SCS2 business as something I would actually want to do.

    If you are reading this and feel you would like to pull SCS2 along with me: Do write in.

    Let me remark that I found it a stellar moment of this session when a former Google employee mentioned that at Google they did think long and hard about whether to kill Reader (which was supporting the open RSS standard, and thus was a positive thing at least by Google standards) and then decided they would not keep running it for three people in a cave.

    Ummm, now that I think about it, I don't remember whether the ”three people in a cave” quip came from her, but somehow the phrase was in the room, and one participant actually got fairly cross because they are missing Google Reader to this day[1] and they resented being considered one of three people in a cave.

    Similarly for the “breaking change“ of switching mobile phone standards (GSM to UMTS to LTE), there were immediately people in the room who are still unhappy because they had to discard perfectly good phones when the networks their modems knew were shut down. So, in a way my message of “if you can help it, don't do breaking changes, because someone will get pissed with you” was brought home very impressively. This one time, however, I'd much rather be wrong. Perhaps there are ways to have relatively painless major version migrations of more or less mature federated systems.

    Raising some hopes in that direction, the migration from Plastic to SAMP in the early days of the VO was mentioned as something that has worked rather nicely. Ok: That was not exactly a federated client-server system, but it was not too far from that either. Perhaps one should have a closer look at that story.

    DCP (2025-06-04, 10:00)

    I'm now sitting the the session of the Data Curation and Preservation WG, and I am delighted that in Gilles' talk, something that was, in the end, rather simple in implemenation yields something as complex as provenance graphs such as this:

    A part of a graphviz visualisation having nodes like gav_tap, the GAVO DC team, our obscore table, and so on.

    which occurs towards the end of Gilles' slideset. The full graph integrates our part of a not entirely trivial table's provenance with some metadata coming from CDS. That I found remarkable in itself.

    The delightful detail about it, however, is that I had never planned for the data origin implementation to enable anything like this. That on the client side you can do things the publishers have never meant you to do (and mind you, I personally am not convinced scientists would like to contemplate such graphs), that is why I think interoperable standards letting users do whatever they like on their end of the protocol is such a great thing.

    Yes, that was a stinger against “platforms”, as much they have been all the craze a few years ago. On them, the publisher controls the client, too, and the more platformy something is, the more users will be limited by the ideas of the publishers.

    Obscore and Extensions (2025-06-04, 15:00)

    I was worried for a moment that this would be an Interop day without a talk by me. Fortunately, Renaud asked me to give his talk on the Registry aspects of Obscore extensions (which, to be fair, already had me on its author list before). This is in the context of something I am fairly happy about: extra tables next to instances of ivoa.obscore (where we can store all kinds of results of astronomical observations) that cover metadata that is peculiar to certain fields: messenger types like radio or high energy for instance. If you are running DaCHS, you can already have a draft of one of these (Radio) since DaCHS 2.10.

    So, this time, there is a session on the extensions for high energy, radio, and possibly time, with a view of how to use and find them in practice. Given that the unfortunate (“my biggest mistake”) dataModel element for discovery of Obscore tables came up again in Grégory's talk, I am happy I had a chance to make my point again on why we need to discover these kinds of things differently than what I had envisioned in 2012. If you weren't there: It's basically what I said last year in TableReg (the April 2025 date on this reflects a very minor fix).

    Apps II (2025-06-05, 12:00)

    When I sat in the Apps 2 session I was still shaking my head about Grégory's slide from his talk on rewriting the grammar for our ADQL query language in a formalism called PEG. In itself, PEG and the grammar are great (I have contributed to it quite a bit myself). They give absolutely no reason for head-shaking. But then there are various libraries that read PEG grammars and build parsers from them. It turns out that each library has tiny little, largely inexplicable quirks in the way they expect the PEG to be written.

    This made Grégory squeeze something like a source grammar through several pieces of sed horror to fit it to the various concrete PEG machineries. Here's how this looks like for the Canopy PEG library:

    A presentation slide with some red arrows mapping grammar rules greyed out in the background, and wild sed rules with a bit of syntax highlighting in the foreground.

    Call me overly sensitive, but it's things like these that sometimes makes me seriously consider becoming a vegetable gardener and don't ever touch computers again.

    But then I'm too much of a language lawyer to not enjoy the sort nitpicking I just did in the Apps 2 session, and none of that would exist without computers. Basically, it was about this VOTable being broken:

    <VOTABLE><RESOURCE><TABLE>
    <FIELD name="objname" datatype="char" arraysize="*"/>
    <DATA><TABLEDATA><TR>
    <TD>Joachim Wambsganß</TD>
    </TR></TABLEDATA></DATA></TABLE></RESOURCE></VOTABLE>
    

    Looks fine to you? Well, have a look at my lecture notes to see what's wrong and what ways to improve the situation I see. Still, I feel an urge to confess I had quite a bit of rather twisted fun when I gave that talk. It must be that kind of sentiment that leads to the Babylonoid confusion that Grégory has regretted in his PEG talk.

    DM 3 (2025-06-05, 17:00)

    Another plenary discussion session: Data Models: modularity, levels, endorsement. I have to really try hard not to blurt out “told you so, told you so” every few minutes. But I could not resist sneaking in a link to a PR against astropy that still illustrates what I think we should to DMs like (even if it's now many years old): https://github.com/msdemlei/astropy. I think I'll leave this repo at commit dcc88dc forever. And that's about all I can say about that topic without losing my equanimity. Aw, I even had code showing how to deal with breaking changes in that astropy fork:

    pos_ann = None
    for desired_type in ["stc3:Coords", "stc2:Coords"]:
      for ann in ann.get_annotations(desired_type):
        pos_ann = ann
    
        if pos_ann is not None:
          break
    
    if pos_ann is None:
      raise Exception("Don't understand any target annotation")
    

    Meanwhile, the Spring Cleaning Hackaton of the Registry WG that I had looked forward to above happened two hours ago. It was very interesting to debug the workflow for assigning subject keywords for resources (the thing I was taking about in my lofty semantics post) for a certain data centre that shall remain unnamed here. We eventually found out the reason their subjects were substandard was that the person responsible for picking them was not aware of that responsibility.

    If you ask me, this hackathon showed again that getting people together in a room is the preferred way to work out what these days you might call hybrid problems: Not entirely social and organisational, but not entirely technical either. What we did in that hour would have taken many mails and a lot more time to solve if we had even started doing it rather than just resigning to the (in this case) substandard keywords.

    Wrapping Up (2025-06-06)

    I am sitting in the traditional last session of the Interop, where the chairs of the various Working and Interest Groups look back on their sessions. I just have to comment one thing from Grégory and Joshua's summary for DAL, where they quote me as:

    A cutout of a presentation slide with a fake post-it note quoting me as saying: dataModel in TAPRegExt was a terrible mistake.

    Let me stress that the reason I was so blunt here is that it was I who put the dataModel element into TAPRegExt. It seemed a good idea at the time. For the story of how that later turned out to be an mistake, I would again like to draw your attention to TableReg.

    Before this closing session, I had my last talk at this Interop. That happened in the DAL 2 session in the form of a report on my addition to the persistent uploads that I have recently discussed here. The following talk by Pat from CADC mentioned that they did the indexing part somewhat differently; let's see how we reach consensus here.

    So, that's it for this Interop. The parting exec chair, Simon, had the last word, rightfully thanking the local organisers who really had a hard time given the political chaos around them, and also reminded people that we will next meet in Görlitz – which means that I will be the local organiser. I'm nervous already:

    A presentation slide advertising the Southern Spring meeting 2025 hosted in Görlitz with a few fake photos from there (showing the future DZA) and a groundplan of the future institute.  It stresses that “Görlitz is about 1 hour from Dresden”.
    [1]The question of why that person has not just migrated to some open alternative – after all, the option to do that is one of the strong advantages of using open standards like Atom or RSS– I cannot answer, and it's quite beside the point for what the session was trying to address, too.
  • At the Malta Interop

    A bonze statue of a running man with a newspaper in his hand in front of a massive stone wall.

    The IVOA meets in Malta, which sports lots of walls and fortifications. And a “socialist martyr” boldly stepping forward (like the IVOA, of course): Manwel Dimech.

    It is Interop time again! Most people working on the Virtual Observatory are convened in Malta at the moment and will discuss the development and reality of our standards for the next two days. As usual, I will report here on my thoughts and the proceedings as I go along, even though it will be a fairly short meeting: In northen autumn, the Interop always is back-to-back with ADASS, which means that most participants already have 3½ days of intense meetings behind them and will probably be particularly glad when we will conclude the Interop Sunday noon.

    The TCG discusses (Thursday, 15:00)

    Right now, I am sitting in a session of the Technical Coordination Group, where the chairs and vice-chairs of the Working and Interest Groups meet and map out where they want to go and how it all will fit together. If you look at this meeting's agenda, you can probably guess that this is a roller coaster of tech and paperwork, quickly changing from extremely boring to extremely exciting.

    For me up to now, the discussion about whether or not we want LineTAP at all was the most relevant agenda item; while I do think VAMDC would win by taking up the modern IVOA TAP and Registry standards (VAMDC was forked from the VO in the late 2000s), takeup has been meagre so far, and so perhaps this is solving a problem that nobody feels[1]. I have frankly (almost) only started LineTAP to avoid a SLAP2 with an accompanying data model that would then compete with XSAMS, the data model below VAMDC.

    On the other hand: I think LineTAP works so much more nicely than VAMDC for its use case (identify spectral lines in a plot) that it would be a pity to simply bury it.

    By the way, if you want, you can follow the (public; the TCG meeting is closed) proceedings online; zoom links are available from the programme page. There will be recordings later.

    At the Exec Session (Thurday, 16:45)

    The IVOA's Exec is where the heads of the national projects meet, with the most noble task of endorsing our recommendations and otherwise providing a certain amount of governance. The agenda of Exec meetings is public, and so will the minutes be, but otherwise this again is a closed meeting so everyone feels comfortable speaking out. I certainly will not spill any secrets in this post, but rest assured that there are not many of those to begin with.

    That I am in here is because GAVO's actual head, Joachim, is not on Malta and could not make it for video participation, either. But then someone from GAVO ought to be here, if only because a year down the road, we will host the Interop: In the northern autumn of 2025, the ADASS and the Interop will take place in Görlitz (regular readers of this blog have heard of that town before), and so I see part of my role in this session in reconfirming that we are on it.

    Meanwhile, the next Interop – and determining places is also the Exec's job – will be in the beginning of June 2025 in College Park, Maryland. So much for avoiding flight shame for me (which I could for Malta that still is reachable by train and ferry, if not very easily).

    Opening Plenary (Friday 9:30)

    A lecture hall with people, a slide “The University of Malta” at the wall.

    Alessio welcomes the Interop crowd to the University of Malta.

    Interops always begin with a plenary with reports from the various functions: The chair of the Exec, the chair of the committee of science priorities, and chair of technical coordination group. Most importantly, though, the chairs of the working and interest groups report on what has happened in their groups in the past semester, and what they are planning for the Interop (“Charge to the working groups”).

    For me personally, the kind words during Simon's State of the IVOA report on my VO lecture (parts of which he has actually reused) were particularly welcome.

    But of course there was other good news in that talk. With my Registry grandmaster hat on, I was happy to learn that NOIRLabs has released a simple publishing registry implementation, and that ASVO's (i.e., Australia) large TAP server will finally be properly registered, too. The prize for the coolest image, though, goes to VO France and in particular their solar system folks, who have used TOPCAT to visualise data on a model of comet 67P Churyumov–Gerasimenko (PDF page 20).

    Self-Agency (Friday, 10:10)

    A slide with quite a bit of text.  Highlighted: “Dropped freq_min/max“

    I have to admit it's kind of silly to pick out this particular point from all the material discussed by the IG and WG chairs in the Charge to the Working Groups, but a part of why this job is so gratifying is experiences of self-agency. I just had one of these during the Radio IG report: They have dropped the duplication of spectral information in their proposed extension to obscore.

    Yay! I have lobbied for that one for a long time on grounds that if there is both em_min/em_max and f_min/f_max in an obscore records (which express the same thing, with em_X being wavelengths in metres, and f_X frequencies in… something else, where proposals included Hz, MHz and GHz), it is virtually certain that at least one pair is wrong. Most likely, both of them will be. I have actually created a UDF for ADQL queries to make that point. And now: Success!

    Focus Session: High Energy and Time Domain (Friday, 12:00)

    The first “working” session of the Interop is a plenary on High Energy and Time Domain, that is, instruments that look for messenger particles that may have the energy of a tennis ball, as well as ways to let everyone else know about them quickly.

    Incidentally, that “quickly” is a reason for why the two apparently unconnected topics share a session: Particles in the tennis ball range are fortunately rare (or our DNA would be in trouble), and so when you have found one, you might want make sure everone else gets to look whether something odd shows up where that particle came from in other messengers (as in: optical photons, say). This is also relevant because many detectors in that energy (and particle) range do not have a particularly good idea of where the signal came from, and followups in other wavelengths may help figuring out what sort of thing may have produced a signal.

    I enjoyed a slide by Jutta, who reported on VO publication of km3net data, that is, neutrinos detected in a large detector cube below the Mediterrenean sea, using the Earth as a filter:

    Screenshot of a slide: “What we do: Point source analysis, Alerts and follow-ups; What we don't do: Mission planning, Nice pictures.”

    “We don't do pretty pictures“ is of course a very cool thing one can say, although I bet this is not 120% honest. But I am willing to give Jutta quite a bit of slack; after all, km3net data is served through DaCHS, and I am still hopeful that we will use it to prototype serving more complex data products than just plain event lists in the future.

    A bit later in the session, an excellent question was raised by Judy Racusin in her talk on GCN:

    A talk slide, with highlighted text: “Big Question: Why hasn't this [VOEvent] continued to serve the needs of various transient astrophysics communities?”

    The background of the question is that there is a rather reasonable standard for the dissemination of alerts and similar data, VOEvent. This has seen quite a bit of takeup in the 2000s, but, as evinced by page 17 of Judy's slides, all the current large time-domain projects decided to invent something new, and it seems each one invented something different.

    I don't have an definitive answer to why and how that happened (as opposed to, for instance, everyone cooperating on evolving VOEvent to match whatever requirements these projects have), although outside pressures (e.g., the rise of Apache Avro and Kafka) certainly played a role.

    I will, however, say that I strongly suspect that if the VOEvent community back then had had more public and registered streams consumed by standard software, it would have been a lot harder for these new projects to (essentially) ignore it. I'd suggest as a lesson to learn from that: make sure your infrastructure is public and widely consumed as early as you can. That ought to help a lot in ensuring that your standard(s) will live long and prosper.

    In Apps I (Friday 16:30)

    I am now in the Apps session. This is the most show-and-telly event you will get at an Interop, with largest likelihood of encountering the pretty pictures that Jutta had flamboyantly expressed disinterest in this morning. In the first talk already, Thomas delivers with, for instance, mystic pictures from Mars:

    A photo of Olympus Mons on Mars with overplotted contour lines.

    Most of the magic was shown in a live demo; once the recordings are online, consider checking this one out (I'll mention in passing that HiPS2MOC looks like a very useful feature, too).

    My talk, in contrast, had extremely boring slides; you're not missing out at all by simply reading the notes. The message is not overly nice, either: Rather do fewer features than optional ones, as a server operator please take up new standards as quickly as you can, and in the same role please provide good metadata. This last point happened to be a central message in Henrik's talk on ESASky (which aptly followed mine) as well, that, like Thomas', featured a live performance of eye candy.

    Mario Juric's talk on something called HATS then featured this nice plot:

    A presentation slide headed “partition hierarchically“, with all-sky heatmap featuring pixels of varying size.

    That's Gaia's source catalogue pixelated such that the sources in each pixel require about a constant processing time. The underlying idea, hierarchical tiling, is great and has proved itself extremely capable not only with HiPS, which is what is behind basically anything in the VO that lets you smoothly zoom, in particular Aladin's maps. HATS' basic premise seems to be to put tables (rather than JPEGs or FITS images as usual) into a HiPS structure. That has been done before, as with the catalogue HiPSes; Aladin users will remember the Gaia or Simbad layers. HATS, now, stores Parquet files, provides Pandas-like interfaces on top of them, and in particular has the nice property of handing out data chunks of roughly equal size.

    That is certainly great, in particular for the humongous data sets that Rubin (née LSST) will produce. But I wondered how well it will stand up when you want to combine different data collections of this sort. The good news: they have already tried it, and they even have thought about how pack HATS' API behind a TAP/ADQL interface. Excellent!

    Further great news in Brigitta's talk [warning: link to google]: It seems you can now store ipython (“Jupyter”) notebooks in, ah well, Markdown – at least in something that seems version-controllable. Note to self: look at that.

    Data Access Layer (Saturday 9:30)

    I am now sitting in the first session of the Data Access Layer Working Group. This is where we talk about the evolution of the protocols you will use if you “use the VO”: TAP, SIAP, and their ilk.

    Right at the start, Anastasia Laity spoke about a topic that has given me quite a bit of headache several times already: How do you tell simulated data from actual observations when you have just discovered a resource that looks relevant to your research?

    There is prior art for that in that SSAP has a data source metadata item on complete services, with values survey, pointed, custom, theory, or artificial (see also SimpleDALRegExt sect. 3.3, where the operational part of this is specified). But that's SSAP only. Should we have a place for that in registry records in general? Or even at the dataset level? This seems rather related to the recent addition of productTypeServed in the brand-new VODataService 1.3. Perhaps it's time for dataSource element in VODataService?

    A large part of the session was taken up by the question of persistent TAP uploads that I have covered here recently. I have summarised this in the session, and after that, people from ESAC (who have built their machinery on top of VOSpace) and CADC (who have inspired my implementation) gave their takes on the topic of persistent uploads. I'm trying hard to like ESAC's solution, because it is using the obvious VO standard for users to manage server-side resources (even though the screenshot in the slides,

    A cutout of a presentation slide showing a browser screenshot with a modal diaglog with a progress bar for an upload.

    suggests it's just a web page). But then it is an order of magnitude more complex in implementation than my proposal, and the main advantage would be that people can share their tables with other users. Is that a use case important enough to justify that significant effort?

    Then Pat's talk on CADC's perspective presented a hierarchy of use cases, which perhaps offers a way to reconcile most of the opinions: Is there is a point for having the same API on /tables and /user_tables, depending on whether we want the tables to be publicly visible?

    Data Curation and Preservation (Saturday, 11:15)

    This Interest Group's name sounds like something only a librarian could become agitated about: Data curation and preservation. Yawn.

    Fortunately, I am considering myself a librarian at heart, and hence I am participating in the DCP session now. In terms of engagement, we have already started to quarrel about a topic that must seem rather like bikeshedding from the outside: should we bake in the DOI resolver into the way we write DOIs (like http://doi.org/10.21938/puTViqDkMGcQZu8LSDZ5Sg; actually, since a few years: https instead of http?) or should we continue to use the doi URI scheme, as we do now: doi:10.21938/puTViqDkMGcQZu8LSDZ5Sg?

    This discussion came up because the doi foundation asks you to render DOIs in an actionable way, which some people understand as them asking people to write DOIs with their resolver baked in. Now, I am somewhat reluctant to do that mainly on grounds of user freedom. Sure, as long as you consider the whole identifier an opaque string, their resolver is not actually implied, but that's largely ficticious, as evinced by the fact that somehow identifiers with http and with https would generally be considered equivalent. I do claim that we should make it clear that alternative resolvers are totally an option. Including ours: RegTAP lets you resolve DOIs to ivoids and VOResource metadata, which to me sounds like something you might absolutely want to do.

    Another (similarly biased) point: Not everything on the internet is http. There are other identifier types that are resolvable (including ivoids). Fortunately, writing DOIs as HTTP URIs is not actually what the doi foundation is asking you to do. Thanks to Gus for clarifying that.

    These kinds of questions also turned up in the discussion after my talk on BibVO. Among other things, that draft standard proposes to deliver information on what datasets a paper used or produced in a very simple JSON format. That parsimony has been put into question, and in the end the question is: do we want to make our protocols a bit more complicated to enable interoperability with other “things”, probably from outside of astronomy? Me, I'm not sure in this case: I consider all of BibVO some sort of contract essentially between the IVOA and SciX (née ADS), and I doubt that someone else than SciX will even want to read this or has use for it.

    But then I (and others) have been wrong with preditions like this before.

    Registry (Saturday 14:30)

    Now it's registry time, which for me is always a special time; I have worked a lot on the Registry, and I still do.

    Given that, in Christophe's statistics talk, I was totally blown away by the number of authorities and registries from Germany, given how small GAVO is. Oh wow. In this graph of authorities in the VO we are the dark green slice far at the bottom of the pie:

    A presentation slide with two pie charts.  In the larger one, there are man small and a couple of large slices.  A dark green one makes up a bit less than 10%.

    I will give you that, as usual with metrics, to understand what they mean you have to know so much that you then don't need the metrics any more. But again there is an odd feeling of self-agency in that slide.

    The next talk, Robert Nikutta's announcement of generic publishing registry code, was – as already mentioned above – particularly good news for me, because it let me add something particularly straightforward into my overview of OAI-PMH servers for VO use, and many data providers (those unwise enough to not use DaCHS…) have asked for that.

    For the rest of the session I entertained folks with the upcoming RFC of VOResource 1.2 and the somewhat sad state of affairs in fulltext seaches in the VO. Hence, I was too busy to report on how gracefully the speaker made his points. Ahem.

    Semantics and Solar System (Saturday 16:30)

    Ha! A session in which I don't talk. That's even more remarkable because I'm the chair emeritus of the Semantics WG and the vice-chair of the Solar Systems IG at the moment.

    Nevertheless, my plan has been to sit back and relax. Except that some of Baptiste's proposals for the evolution of the IVOA voacabularies are rather controversial. I was therefore too busy to add to this post again.

    But at least there is hope to get rid of the ugly “(Obscure)” as the human-readable label of the geo_app reference frame that entered that vocabulary via VOTable; you see, this term was allowed in COOSYS/@system since VOTable 1.0, but when we wrote the vocabulary, nobody who reviewed it could remember what it meant. In this session, JJ finally remembered. Ha! This will be a VEP soon.

    It was also oddly gratifying to read this slide from Stéphane's talk on fetching data from PDS4:

    A presentation slide with bullet points complaining about missing metadata, inconsistent metadata, and other unpleasantries.

    Lists like these are rather characteristic in a data publisher's diary. Of course, I know that's true. But seeing it in public is still gives me a warm feeling of comradeship. Stéphane then went on to tell us how to make the cool 67P images in TOPCAT (I had already mentioned those above when I talked about the Exec report):

    A 3D-plot of an odd shape with colours indicating some physical quantity.

    Operations (Sunday 10.00)

    I am now in the session of the Operations IG, where Henrik is giving the usual VO Weather Report. VO weather reports discuss how many of our services are “valid” in the sense of “will work reasonably well with our clients“. As usual for these kinds of metrics, you need to know quite a bit to understand what's going on and how bad it is when a service is “not compliant”. In particular for the TAP stats, things look a lot bleaker than they actually are:

    A bar graph showing the temporal evolution of the number of TAP servers failing (red), just passing (yellow) or passing (green) validation over the past year or so.  Yellow is king.

    Green is “fully compliant”, yellow is “mostly compliant”, red is “not compliant”. For whatever that means.

    These assessments are based on stilts taplint, which is really fussy (and rightly so). In reality, you can usually use even the red services without noticing something is wrong. Except… if you are not doing things quite right yourself.

    That was the topic of my talk for Ops. It is another outcome of this summer semester's VO course, where students were regularly confused by diagnostics they got back. Of course, while on the learning curve, you will see more such messages than if you are a researcher who is just gently adapting some sample code. But anyway: Producing good error messages is both hard and important. Let me quote my faux quotes in the talk:

    Writing good error messages is great art: Do not claim more than you know, but state enough so users can guess how to fix it.

    —Demleitner's first observation on error messages

    Making a computer do the right thing for a good request usually is not easy. It is much harder to make it respond to a bad request with a good error message.

    —Demleitner's first corollary on error messages

    Later in the session there was much discussion about “denial of service attacks” that services occasionally face. For us, that does not seem to be malicious people in general, but people basically well-meaning but challenged to do the right thing (read documentation, figure out efficient ways to do what they want to do).

    For instance, while far below DoS, turnitin.com was for a while harvesting all VO registry records from some custom, HTML-rendering endpoint every few days, firing off 30'000 requests relatively expensive on my side (admittedly because I have implemented that particular endpoint in the most lazy fashion imaginable) in a rather short time. They could have done the same thing using OAI-PMH with a single request that, no top, would have taken up almost no CPU on my side. For the record, it seems someone at turnitin.com has seen the light; at least they don't do that mass harvesting any more for all I can tell (without actually checking the logs). Still, with a single computer, it is not hard to bring down your average VO server, even if you don't plan to.

    Operators that are going into “the cloud” (which is a thinly disguised euphemism for “volunatrily becoming hostages of amazon.com”) or that are severely “encouraged” to do that by their funding agencies have the additional problem in that for them, indiscriminate downloads might quickly become extremely costly on top. Hence, we were talking a bit about mitigations, from HTTP 429 status codes (”too many requests“) to going for various forms of authentication, in particular handing out API keys. Oh, sigh. It would really suck if people ended up needing to get and manage keys for all the major services. Perhaps we should have VO-wide API keys? I already have a plan for how we could pull that off…

    Winding down (Monday 7:30)

    The Interop concluded yesterday noon with reports from the working groups and another (short) one from the Exec chair. Phewy. It's been a straining week ever since ADASS' welcome reception almost exactly a week earlier.

    Reviewing what I have written here, I notice I have not even mentioned a topic that pervaded several sessions and many of the chats on the corridors: The P3T, which expands to “Protocol Transition Tiger Team”.

    This was an informal working group that was formed because some adopters of our standards felt that they (um: the standards) are showing their age, in particular because of the wide use of XML and because they do not always play well with “modern” (i.e., web browser-based) “security” techniques, which of course mostly gyrate around preventing cross-site information disclosure.

    I have to admit that I cannot get too hung up on both points; I think browser-based clients should be the exception rather than the norm in particular if you have secrets to keep, and many of the “modern” mitigations are little more than ugly hacks (“pre-flight check“) resulting from the abuse of a system designed to distribute information (the WWW) as an execution platform. But then this ship has sailed for now, and so I recognise that we may need to think a bit about some forms of XSS mitigations. I would still say we ought to find ways that don't blow up all the sane parts of the VO for that slightly insane one.

    On the format question, let me remark that XML is not only well-thought out (which is not surprising given its designers had the long history of SGML to learn from) but also here to stay; developers will have to handle XML regardless of what our protocols do. More to the point, it often seems to me that people who say “JSON is so much simpler” often mean “But it's so much simpler if my web page only talks to my backend”.

    Which is true, but that's because then you don't need to be interoperable and hence don't have to bother with metadata for other peoples' purposes. But that interoperability is what the IVOA is about. If you were to write the S-expressions that XML encodes at its base in JSON, it would be just as complex, just a bit more complicated because you would be lacking some of XML's goodies from CDATA sections to comments.

    Be that as it may, the P3T turned out to do something useful: It tried to write OpenAPI specifications for some of our protocols, and already because that smoked out some points I would consider misfeatures (case-insensitive parameter names for starters), that was certainly a worthwhile effort. That, as some people pointed out, you can generate code from OpenAPI is, I think, not terribly valuable: What code that generates probably shouldn't be written in the first place and rather be replaced by some declarative input (such as, cough, OpenAPI) to a program.

    But I will say that I expect OpenAPI specs to be a great help to validators, and possibly also to implementors because they give some implementation requirements in a fairly concise and standard form.

    In that sense: P3T was not a bad thing. Let's see what comes out of it now that, as Janet also reported in the closing session, the tiger is sleeping:

    A presentation slide with a sleeping tiger and the proclamation that ”We feel the P3T has done its job”.
    [1]“feels” as opposed to “has”, that is. I do still think that many people would be happy if they could say something like: “I'm interested in species A, B, and C at temperature T (and perhaps pressure p). Now let me zoom into a spectrum and show me lines from these species; make it so the lines don't crowd too much and select those that are plausibly the strongest with this physics.”
  • GAVO at the Fall 2023 Interop in Tucson

    The Virtual Observatory, in practical terms, is the set of standards created and maintained by the IVOA. The IVOA, in turn, is a community almost defined by the two conferences it holds every year, the Interops (previously on this blog). The most recent Interop has just ended: The 2023 Tucson Fall Interop. Here are a few notes on what went on there from my (and to some extent GAVO's) perspective.

    A almost-orange orange haging in a tree.

    This fall's IVOA Interop was hosted by Steward Observatory, where they had ripening oranges in the backyard. They were edible!

    For at least a decade and a half, the autumn Interops have been back-to-back with the ADASS conferences. ADASS, short for Astronomical Data Analysis Software and Systems, is a venerable conference series, created far in the last century (this year: ADASS XXXIII) to have a forum for people who work in the magic triangle of astronomy, instrumentation, and data processing. Clearly, such a forum is very well suited to spread the word about the miracles we are working in the VO.

    To that end, I was involved in the creation of three posters: One on the use of MOCs in TAP – a somewhat extended version of something you saw on this blog first –, then one on data discovery in pyVO by Renaud Savalle (Paris) et al – a topic again familiar to readers of this blog – and finally one on improving the description of ADQL to enable more reliable machine validation of its grammar by Grégory Mantelet (Strasbourg) et al.

    As the conference at large goes, I was really delighted to see how basically everyone talking about data publication at all was stressing they are “doing VO”, which was a very welcome change from, perhaps, 10 years ago when this kind of talk was typcially extolling the virtues of one particular web or javascript framework. One of the great thing about standards in general and the VO in particular is that they tend to be a lot more durable than all those frameworks.


    The following Interop was a “short” one, lasting from Friday morning until Sunday noon, which meant that I was far too busy to do anything like a live blog while it went on. Let me hence just briefly point out the main talks related to GAVO's current activities and DaCHS.

    In Data Curation and Preservation on Saturday morning, Baptiste Cecconi (Paris) gave a nice overview of – among other things – what our bridge between the Registry and b2find (in particular, using the VOResource to DataCite mapper) enables in the context of the EOSC, and he briefly touched the question of how to properly make landing pages for VO resources (for which I am currently using another piece of XSLT).

    In the Radio session later that morning, Ixaka Labadie (Granada) gave a talk on how he is using DaCHS to deliver 3D visualisations for fairly impressive (prototype) SKA data. I particularly liked his illustrations of how DaCHS does Datalink and SODA. See his slide 12:

    Boxes and arrows illustrating how SIAP and Datalink are described in DaCHS resource descriptors

    In the afternoon, there was the Registry session, which featured me talking about the harvest trigger service I have been running for a while to help people across the anticlimactic moment when you have published your new resource but it won't show up in TOPCAT or pyVO for a day or so.

    The bulk of this session, however, was used for a discussion about various shortcomings of the Registry or its interfaces that I found pleasantly productive – incidentally, just like the discussion on word lists in EPN-TAP on Friday afternoon's Solar System Session that I had the pleasure to chair.

    In the DAL session on that afternoon, I had two talks: One was on the proposed new interoperable user-defined functions already implemented in DaCHS' ADQL and now coming up in several other services, too. Note to self: Some of these would probably be rather suitable blog post material.

    The second talk was a sort of brief show-and-tell pitch, in which I pointed out that hierarchical TAP examples using the elegant examples:continued property now actually work in both pyVO and TOPCAT:

    A three-level popup menu Service Provided -> Local UDFs -> using ivo_histogram

    Finally, in Sunday morning's Apps session, I talked about global image discovery in pyVO. This was about an early promise of the VO: just say where in space, time, and spectrum you need an image (or spectrum, or time series, or whatever), and some apparatus will find and query all the services that could have pertinent data. It would then present the metadata of the datasets it found in some useful form that would let you make informed decisions which to fetch.

    This was not too difficult in the olden days, but by now the VO is so big and complicated that a pyVO module with fairly involved logic is required. If you don't want to read the notes here, don't worry: I can safely predict that you'll read more about that topic on this blog.

    This is nowhere near done yet; so, it is one more piece of homework that I am taking home with me.

Page 1 / 3 »