Posts with the Tag Interop:

  • GAVO at the Fall 2023 Interop in Tucson

    The Virtual Observatory, in practical terms, is the set of standards created and maintained by the IVOA. The IVOA, in turn, is a community almost defined by the two conferences it holds every year, the Interops (previously on this blog). The most recent Interop has just ended: The 2023 Tucson Fall Interop. Here are a few notes on what went on there from my (and to some extent GAVO's) perspective.

    A almost-orange orange haging in a tree.

    This fall's IVOA Interop was hosted by Steward Observatory, where they had ripening oranges in the backyard. They were edible!

    For at least a decade and a half, the autumn Interops have been back-to-back with the ADASS conferences. ADASS, short for Astronomical Data Analysis Software and Systems, is a venerable conference series, created far in the last century (this year: ADASS XXXIII) to have a forum for people who work in the magic triangle of astronomy, instrumentation, and data processing. Clearly, such a forum is very well suited to spread the word about the miracles we are working in the VO.

    To that end, I was involved in the creation of three posters: One on the use of MOCs in TAP – a somewhat extended version of something you saw on this blog first –, then one on data discovery in pyVO by Renaud Savalle (Paris) et al – a topic again familiar to readers of this blog – and finally one on improving the description of ADQL to enable more reliable machine validation of its grammar by Grégory Mantelet (Strasbourg) et al.

    As the conference at large goes, I was really delighted to see how basically everyone talking about data publication at all was stressing they are “doing VO”, which was a very welcome change from, perhaps, 10 years ago when this kind of talk was typcially extolling the virtues of one particular web or javascript framework. One of the great thing about standards in general and the VO in particular is that they tend to be a lot more durable than all those frameworks.


    The following Interop was a “short” one, lasting from Friday morning until Sunday noon, which meant that I was far too busy to do anything like a live blog while it went on. Let me hence just briefly point out the main talks related to GAVO's current activities and DaCHS.

    In Data Curation and Preservation on Saturday morning, Baptiste Cecconi (Paris) gave a nice overview of – among other things – what our bridge between the Registry and b2find (in particular, using the VOResource to DataCite mapper) enables in the context of the EOSC, and he briefly touched the question of how to properly make landing pages for VO resources (for which I am currently using another piece of XSLT).

    In the Radio session later that morning, Ixaka Labadie (Granada) gave a talk on how he is using DaCHS to deliver 3D visualisations for fairly impressive (prototype) SKA data. I particularly liked his illustrations of how DaCHS does Datalink and SODA. See his slide 12:

    Boxes and arrows illustrating how SIAP and Datalink are described in DaCHS resource descriptors

    In the afternoon, there was the Registry session, which featured me talking about the harvest trigger service I have been running for a while to help people across the anticlimactic moment when you have published your new resource but it won't show up in TOPCAT or pyVO for a day or so.

    The bulk of this session, however, was used for a discussion about various shortcomings of the Registry or its interfaces that I found pleasantly productive – incidentally, just like the discussion on word lists in EPN-TAP on Friday afternoon's Solar System Session that I had the pleasure to chair.

    In the DAL session on that afternoon, I had two talks: One was on the proposed new interoperable user-defined functions already implemented in DaCHS' ADQL and now coming up in several other services, too. Note to self: Some of these would probably be rather suitable blog post material.

    The second talk was a sort of brief show-and-tell pitch, in which I pointed out that hierarchical TAP examples using the elegant examples:continued property now actually work in both pyVO and TOPCAT:

    A three-level popup menu Service Provided -> Local UDFs -> using ivo_histogram

    Finally, in Sunday morning's Apps session, I talked about global image discovery in pyVO. This was about an early promise of the VO: just say where in space, time, and spectrum you need an image (or spectrum, or time series, or whatever), and some apparatus will find and query all the services that could have pertinent data. It would then present the metadata of the datasets it found in some useful form that would let you make informed decisions which to fetch.

    This was not too difficult in the olden days, but by now the VO is so big and complicated that a pyVO module with fairly involved logic is required. If you don't want to read the notes here, don't worry: I can safely predict that you'll read more about that topic on this blog.

    This is nowhere near done yet; so, it is one more piece of homework that I am taking home with me.

  • At the Bologna Interop

    As I usually do at Interops, I plan to give a few impressions from the Virtual Observatory's semiannual get-together on this blog, updating as we go. This time, it's about the May 2023 Bologna Interop.

    After six „virtual“ Interops (the last one in October 2022), this is the first one with actual people and, most importantly, an actual coffee break table. Attempts to replace that with gathertown, I have to say, never really panned out, so I'm looking forward to pushing ahead many of the small things that make a project like the VO tick, and do that with less effort than try and get people into telecons.

    Also, it's my last Interop as chair of the Semantics Working Group – to prevent informal hierarchies as well as possible, there's a limit of four years in a single IVOA position, and my four years as the herder of meanings are now over. So, the Bologna Semantics Session will be the last one I will chair. Will you do me a favour and attend? Since the conference is hybrid, you can even do that if you are not in town.

    2023-05-09, 10:00

    I approached this morning's Science Platform Plenary with a fair amount of apprehension because I'm always worried that these platforms actually appear so attractive to management because they are the old silos management knows. For instance, people would go back to write software for their data specifically and no one could be blamed for “wasting“ money on software useful to others.

    Sure, custom and tailored software is faster to do, and the resulting lock-in perhaps even helps getting shiny metrics for a while, but the results are also much faster to break, not to mention interoperability goes down the drain, it's a big exercise in exclusion, and of course everyone re-implementing about the same thing every time is a gigantic waste of money and, worse, human effort.

    talk slide proposing thing like various pre-defined cut-outs from cubes, or resolution changes or source extraction for images

    Slide 13 from Jesus' talk. Rights his.

    Fortunately, most of the talks did not aggravate these concerns. On the contrary, most of what I saw was fairly generic compute platforms that very credibly strive to be open, both on getting things in and getting things out.

    But I'll not deny that what I particularly liked was Jesus Salgado's distinctly un-platformy proposals for extending SODA (slide 13) – most of the operations envisaged sound very useful, sensible, and doable, and I will certainly put them into DaCHS if someone (cough else) works them out.

    The only really alarming thing I heard in the platforms session was the term “multi-factor authentication“.

    Come on, none of what we're doing here is the sort of thing where anything major would break if someone pilfered credentials. Please, please let's be reasonable. There's a lot less harm done if someone runs a few CPU hours on someone else's account than if humans were forced to copy many digits from one device to another device all the time[1].

    Don't get me wrong: There are places where 2FA may be a good idea, in particular when other peoples' personal data is concerned. I'm just saying that most of the time, 2FA causes more annoyance than the occasional pilfered credential would (and that you shouldn't process other peoples' personal data without a really strong reason in the first place).

    2023-05-09, 17:00

    A personal highlight of every Interop for me as a Registry geek is of course the session of the Registry WG, which today featured two talks by yours truly. However, it opened with a slightly humbling piece by Hendrik Heinl on how unsatisfying it is to discover time series in the current VO. It would have been badly humbling if it hadn't highlighted why several of the things I've been after for many years matter, most of all the move to data discovery I have talked about here before.

    Of my two talks, one was an abridged and perhaps a bit more entertaining version of my recent blog post on the various sorts of lint I find in the VO Registry. The other was very dry fare on standards development; only look at it if you're into evolving VOResource and its extensions, and I'm afraid I have to say about as much on Renaud's contribution on some incremental changes to StandardsRegExt, which in itself works pretty much exclusively behind the scenes. Suffice it to say that even in the VO there are those little thankless jobs.

    2023-05-10 16:00

    Phewy. Another two talks down, one to go. In the session informally called DOI I (where DOI here is a Digital Object Identifier, in our case almost always managed through DataCite), I reminded everyone that if they have an IVOID (in plain English: are in the VO Registry), they can improve their citeability dramatically by getting themselves a DOI using voidoi (which of course only is interesting if you cannot or do not want to mint your resource's DOI in some other way).

    Let me mount a soapbox here for a moment: I'm caring about DOIs because I want paper authors to be able to cite data in a way that lets people find the resources used. That in the case of a DOI the reference is machine-readable to me is a liability rather than an advantage, since it makes it even easier to come up with metrics. And metrics, I claim, are almost always a bad thing, either masking agendas that should be made explicit or, worse and more typical, making matters worse accidentally – which is almost inevitable as soon as people start gaming the metrics, which in turn is almost inevitable when you threaten their livelihoods using metrics.

    Given that, it was not easy keeping quiet and not starting to argue points to that effect (which I'll gladly do here if anyone gives me an excuse to do so) during much of the second DOI session. Let me at least make one point to any funders possibly venturing here: Persistent identifiers to data don't make persistent institutions keeping the data obsolete.

    Such persistent institutions also have a critical role in curating the metadata going into the PIDs, a point driven home in Gus' talk; look at slide 15 for impressions of the sort of desasters happening when you create citations from DataCite records encountered in the wild. In my assumed role as a Registry janitor (as per this recent post) I had complete empathy with Gus.

    My second talk this morning I again gave in the wonderful large auditorium (a real treat for a limelight hog like me): I talked about the hairy problems raised by major version steps in protocols. There was not too much discussion on this – less than I had hoped for, really, in particular later during the lunch break –, but having presented the problem in front of this kind of audience, I'm now rather sure the right way to proceed is what's Option I in my talk: deprecate servicetype='image'. The sort of global discovery that was envisaged to be enabled by servicetype constraints probably needs to be handled in a proper function hiding the gory details from the users.

    2023-05-11, 12:30

    This morning I had the last session in my term as the chair of the Semantics working group, featuring talks reporting on the progress of various semantic artefacts by different people; whether or not it's justified, I feel some satisfaction seeing this sort of activity that I'd take as the sign of a mature working group operating.

    Me, on the other hand, talked quite a bit on an entirely maverick topic: Linked Data in VOTable. As I point out in the talk, in the one place we are using RDFa (which I identify with the buzzword “linked data“ for the purposes of this talk) in the VO it's a big success (TAP examples, which use RDFa over XHTML). Perhaps we should have more of that?

    The obvious place to add RDFa to VO stuff would be our central container format VOTable, which conveniently is based on XML, and hence existing RDFa tooling is immediately applicable when we add a few RDFa attributes to a few VOTable elements. I proved that with some examples and three lines of pyrdfa code and was sort-of happy with getting nice, Turtle-formatted RDF triples out of very lightly annotated VOTables.

    However, if you have followed the pyrdfa link, you may have seen the main argument against the whole effort:

    This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

    It would seem that RDFa within XML-derived formats is not a terribly active topic these days. If that's true, then effort from the VO side to be interoperable with this part of the outside world would be largely wasted – that outside world might very well be smaller than the VO itself now. On the other hand, if I look at Linked Open Vocabularies, it would seem that there are communities using RDF as such very actively, and some of these vocabularies we could very well reuse.

    And then there is a problem I couldn't figure out that may be a good test case for using ChatGPT on technical questions (feel free to try): “How do I make an RDF resource out of element content in RDFa?“ In case that's too dense a question: What I'd like to do is some RDFa markup such that:

    <INFO property="doap:homepage"
      magic-attribute="magic-value"
      >http://foo.bar"</INFO>
    

    works out to:

    <> doap:homepage <http://foo.bar>
    

    in Turtle (note the angle brackets rather than quotes, indicating we are talking about an RDF resource rather than a literal that happens to look like a URI). Can't be hard, can it?

    Screenshot of an ADQL cheat sheet with an optional WITH clause in a red ellipse.

    New in TOPCAT: If it senses that a service understands common table expressions, it will inform you accordingly on its ADQL cheat sheet.

    Oh, and then I'd like to add an impression from the Apps/Ops session late on Wednesday, where I simply have to hand out the tasteful-application-of-standards award to Mark Taylor. In his news from TOPCAT report he described how based on whether or not the capabilities of a TAP service say its ADQL supports CTEs (“WITH”) he changes his cheat sheet to show or hide the optional with clause as shown in the figure above.

    Sure: That's a real small detail. But sometimes it's small details like this that make the difference between folks puzzling how to do a seemingly simple thing (as I am still on the resourcification of element content in RDFa) and them realising there is an elegant solution to what they're trying to do.

    2023-05-13 11:00

    The Interop ended yesterday morning, and now I'm returning home with about a metric ton of homework. Which is probably a good thing.

    One piece of homework I got from Robert Nikutta (NOIRLab) who blasted a piece of text I wrote when I was chairing the Registry WG: Getting into the Registry (this may already have improved by the time you read this). Here's Robert's slide on it:

    A slide criticising some text as incomprehensible.

    Now, I think I have to put up the defense that this was basically the abstract and there are more explanations further down the page, for instance on the “purx” that confused Robert so much[2]. More importantly, though: If you don't understand some VO documentation, it is rather likely that you are not the only one. You will not only help yourself but all these other people if you complain, ideally with suggestions on how to improve or perhaps concrete questions.

    If it is not otherwise clear just who to complain to, use the mailing list of a working or interest group that sounds as if it might be responsible. I can't promise you we will improve matters, but knowing about a problem makes it a lot more likely someone will address it.

    In Robert's concrete issue of a simple and straightforward OAI-PMH component, on the other hand, documentation is not enough. At least as long as I cannot convince the rest of the world that collaborating on DaCHS[3] is a much smarter move than everyone developing their own server software, there really should be such a thing, and I think I've charmed some of the self-implementors into collaborating in such an effort.

    Traditionally, the last talk of an Interop is reserved for the chair of the Exec (the bosses of the national VO projects). They then reveal who the Exec has chosen as the future chairs and vice-chairs of the working and interest groups. I will not pretend that I was surprised: I will be vice chair of the solar system interest group in the next few years. And I already have a first project that came up during one of the many, many, many coffee break discussions of this Interop: finally start collecting planetary reference frames for the vocabulary of references frames. What a nice bridge from semantics to solar system!

    [1]No, having to carry around and plug in and out some additional hardware is only marginally less annoying than the digit-copying 2FA schemes.
    [2]I will give you that my predilection for cute names is not always helpful, though.
    [3]DaCHS of course has an OAI-PMH interface built in, but that is so highly integrated with its metadata management and XML generation that pulling it out just is not worth it.
  • Another Virtual Interop

    A part of a presentation slide, containing the sentence “to select single/multiple rows from plot use Handles layer

    One thing you could learn at this interop: How to identify the source row of a line in the TOPCAT's XYArray plot. See the end of this post for where this comes from.

    It's Interop time again! That is, most of the people involved in developing the Virtual Observatory (or for it) will report on what they have been up to since the last Interop, and what they are planning for the near-ish future. It is again an online meeting, so if interested, you could still register and then attend a couple of our sessions.

    You will see me as a chair (but for the first time since I became chair there not as a speaker) in Semantics, and I'll have talks in Registry (obligatorily) and DAL 1, though regular readers of this blog will have a few déjà vus.

    I plan to update this post as the meeting progresses – so, perhaps check back a few times until thursday.

    Update 2022-10-18, 15:00 UTC: I was expecting the VO in the Cloud Plenary with quite a bit of anxiety, because “in the cloud“ these days tends to mean “stuff things into proprietary walled gardens“. The first input talk turned out to be quite a bit less scary: Data providers want to have links to commerical cloud providers in addition to http download links. That's reasonable given users may want to optimise accesses for large data sets, and seeing that most respondents pointed to Datalink as the way to do that (as I did) was nice. The devil is in the details, though: Making good concepts that let clients figure out what are, in a sense, “equivalent“ ways to obtain the data is probably hard. The one thing I'm sure about is that I don't want concepts like #aws-metadata in datalink/core.

    And the rest of the session was rather a “how VO standards are or may be useful to us“ rather than the “dump the old open rubbish and move on to walled gardens“ I was worrying about. So… excellent!

    Update 2022-10-18, 21:10: Sitting in the DAL 1 Session, I am seriously tempted to become a gardener while listening to Tom's talk on Firewalls against ADQL. I have to thank U Heidelberg for hosting our services without horrible “Web Application Firewalls“ or trying to hack into https connections to “sanitise“ requests. At STScI, it seems the density of snake oil “security appliances“ is so high that at least somewhat advanced network usage like TAP and ADQL becomes really shaky.

    Can we just genrally disarm and perhaps, if SQL injection really is a problem in individual cases, just hire programmers on permanent contracts (meaning: they'll aquire sufficient experience) and/or reviewers for the software we run facing the net? It's not like SQL injection is just bad luck. It's a bug in every single case, and a sort of bug that's relatively simple to avoid – simpler in any case than detecting SQL injection attempts with a reasonable false-positive rate.

    Update 2022-10-20, 5:00 UTC: Yesterday, I had reasons both for rejoicing and for wishing for a brown bag. The rejoicing part was (for instance) in the solar system session, where Steve Joy reported on getting PDS Planetary Plasma Interactions (PPI) data into the VO – that's a good thing no matter what, especially given that I have a very soft spot for solar system data anyway. As the main author of DaCHS, however, I was particularly happy to see PPI are using it to talk to the VO. DaCHS thus is now running in Los Angeles, too. Hollywood, practically.

    The brown bag moment came in the Registry session; while my talks I think went fine – one of them basically being the oral version of a post from this blog –, Tom's talk on pyvo.registry made me cringe because he pointed out a bad interoperability sin on my side. The problem was not that my code unconcernedly uses COALESCE. From private mails I had understood, perhaps somewhat over-optimistically, that RegTAP operators had greenlighted that after my DAL post from last December, and it's a really simple extension anyway. I give you, though, that I should have ensured that COALESCE really had arrived on the servers before pushing for merging the new regsearch code into pyVO.

    No, what's really embarrassing is the UNION business. You see, the regsearch keyword constraint looks for the words in multiple places, and so it does something conceptually like WHERE keyword matches table1.descripition OR keyword matches table2.subject. Such cross-table ORs are generally extremely hard to plan for the database server, and thus when I re-wrote query generation for the RegTAP keyword search I just put in UNION – queries are really two orders of magnitude faster on my server this way.

    However, UNION has not been part of ADQL 2.0, and although I've lobbied for the set operators for a about a decade now, they are not formally part of ADQL yet. They will be part of ADQL 2.1, but even then they will not be mandatory. Hence, I should not blindly have employed UNION in code supposed to be interoperable, even less so because I can actually programmatically figure out whether a service supports UNION (from the TAP capabilities) and hence could have put in a fallback for where it's unavailable. Aw, dang.

    Update 2022-10-20, 20:00 UTC Just two sessions to go – Radio and Closing, though that little rest will be a challenge, with the closing session ending at 1 am my time.

    Thus, in the midnight hour, for the Semantics working group I will report on our session, which had quite a bit of rather deep plumbing this time. For instance, for the update to our standard on unit syntax, Norman raised the question whether “%“ ought to be a legal unit, and if so, if there's any way to keep ppm, ppb, and ppt out (؉ or ‰, on the other hand, are easy to keep out: We're really stubbornly insisting on pure ASCII). This may border on bikeshedding, but it has very concrete consequences on clients (such as astropy's unit parser) and services (where, for instance, VizieR has to cope with submissions that have columns given in percent). Before the session, it looked like we'd just let in percent, and that only grudgingly. Now… it's likely we will have to be more liberal.

    Great news in the session was that there is now a prototype of a Rosetta Stone for facility names in Paris, that is, a service that lets you map between all the different names your typical observation facility has (for instance, the part of my institute that is up on the mountain could be known as Königstuhl Observatory, Landessternwarte Königstuhl, LSW, Zentrum für Astronomie Heidelberg, and much more). If you have never tried linking all these various names up, you will be surprised how hard that problem is. See Baptiste's slides for how they are tackling it and how they are applying hardcore Semantics tech – in particular, SPARQL – to do it. I liked it a lot.

    Another talk I would like to call out is Steve Crawford's from the session of the Data Curation and Preservation IG. His recommendation to go with CC0 for, well, licensing, is something I can only support exactly because it is not a licence at all, which relieves you of the troublesome problem of assinging copyright so someone. That triviality is only the first of several legal problems we have since we have put the IVOA documents under CC-BY. But since nobody is ever going to court about any of this, the legal trouble is perhaps not terribly worrying. What is nasty about CC-BY is that whatever is licensed CC-BY is (generally) incompatible with the GPL and many other software licenses, which means you will get in trouble if you try to package it with something destined for Debian. And Steve makes some excellent points why CCO is just fine for science data.

    Finally, if you liked the posts on array plotting in TOPCAT and usage in ADQL, you should definitely have a look at Mark's talk in this morning's Apps session, where he in particular shows how you can go from a line in the array plot back to the row that contains the array.

    And with that I've told you where the opening slide fragment came from. Good night!

  • It's Interop Time Again

    A slide with lots of XML on it

    A little ego booster in DAL I: Baptiste and Chloe discuss a feature for incremental harvesting of remote databases using odbcGrammar that I have implanted into DaCHS late last year.

    This morning at seven CEST the first Interop of this year started: It's time again for everyone involved in the VO to come together, tell each other what happened since the last Interop and plan for the next steps. The meeting is purely digital again, and again the schedule is a bit crazy in order to evenly spread time painsj across the globe: there are sessions in the relatively early morning CET, in the late afternoon, and fairly late at night.

    Fairly late at night (by my standards) is now, when I'm listening to the talks in a session of the Data Access Layer working group trying to work out how to do multiple cutouts in one request using SODA, something I've been rather skeptical about while we were coming up with the spec in the mid-2010s: Going from “single value“ to “sequence“ generally complicates matters by something like an order of magnitudes, and with HTTP 1.1 – which lets you run multiple requests in a single connection – doing multiple requests is cheap.

    In contrast, SODA doesn't really say what a service should do if, say, there are multiple positions in a cutout request: should the regions be merged (that's what DaCHS does)? Should multiple images come back? If so, how: in a tar, in a multi-extension FITS, in some other way? What happens if you give both multiple positional and spectral ranges: should there be one result per element of the cartesian product? And if it works that way: should clients have a chance to figure out what combination of parameters produced which result dataset?

    In all that mess, it's gratifying to see that my compromise proposal from way back when – if we do multi-cutout, let's do it by uploading a table specifying one cutout, including a label, per row – to be floated again. But very frankly: My vote would still be to deprecate repeated POS, CIRCLE, BAND, and friends in SODA: requests are cheap these days.

    Oh, and while I'm confessing emotions of perhaps not entirely unselfish gratification: I still rejoice when I see DaCHS applications discussed in public, as Chloé and Baptiste did in their talk.

    Update at 2022-04-27, Morning

    The “virtual” Interop may not be quite as exciting as the real thing, but at least the jetlag is back.

    Yesterday at midnight I gave a talk on requirements and validators, which really was an elaboration of some of the ideas I developed on this blog a month ago. If I may say so myself, I've grown fond of the classification of MUST-s into, in the end, items the machines need, items the users need, admonishments for implementors, and items that we believe the future may need. I'm sure there are more, but even for these I found it remarkable that the less will immediately break if someone violates a piece of a spec, the more important validation becomes. This again is one of these thoughts that feel as if someone probably has pondered them a lot more deeply before…

    I also was really happy about Mark's pitch for validating specifications themselves that kept me awake until one a.m. CEST. In my authoring system ivoatex, I've introduced a hook to allow for a test target, and Mark kindly supported that effort by adding an xsdvalidate subcommand to the excellent stilts. The ivoatex documentation then grew some advice on what and how to test; in case you're writing or maintaining IVOA specs: do have a look. Mark's talk has a few great examples where spec-time validation would have saved a lot of effort and embarrassment.

    Only six hours later, I was back in <expletive deleted> zoom to listen to the Grid session, which again featured Mark, apparently unfazed by the lack of sleep, talking about (potentially) federated authentication outside of the browser (which is something I really want for persistent TAP uploads).

    And then there was the joint time domain/radio session. The slides are not yet there, but once they are, do yourself a favour and at least look at the beautiful images Dougal showed – Radio by now can make about as pretty pictures as Optical – and Alan's talk with the hypnotic sensitivity maps that again showed that low-frequency radio astronomy, seen from outside, is even more of an arcane art than is its high-frequency sibling.

    Update at 2022-04-27, late evening

    For me, this Interop has a strong proper motion slant. In this afternoon's Apps session, I tried to sell an extension to COOSYS I've wanted for a long time, just enough to do epoch propagation.

    You see, ever since my first serious contribution to the VO standards universe, the proposal on doing STC annotation in VOTable in 2010, failed miserably because almost nobody took it up, I have struggled to still somehow get enough annotation added to VOTables to let clients apply proper motions automatically.

    Given there are now data models for Coordinates and what we call Measurements (which roughly is errors and, well, a bit of physics) on the way, I figured this might be a good time to finally fix the COOSYS VOTable element. For one, data centers will revisit the STC annotation anyway if the models and the VOTable data model annotation will pass the reviews, and producing an improved COOSYS would then almost come for free.

    But I can't lie: after the experiences of the past I'd also love to have a fallback position in case we spend another ten years on data models and annotations without getting anywhere. 25 years after the VO's birth epoch (if you will) of J2000.0, many stars have already moved of order of an arcsecond from where our first big catalogues saw them, and so we can ill afford to wait these extra ten years.

    Not surprisingly, the proposal resulted in quite a bit of pushback, perhaps even a bit more than I had expected. Well: I should have given this talk years ago.

    The proper motion topic will come back tomorrow in the second DAL session, when I will talk about ADQL user defined functions to do epoch propagation. This talk will feature one of the prettier plots I've produced in the last few months:

    Three traces of points on a sphere

    What happens if you propagate positions when all you have are proper motions (i.e., no parallaxes and no distances) and you do that naively (blue), in the tangential plane (red), and under the assumption of a purely tangential motion. The lecture notes tell you how to come up with the data plotted here.

    I think I can safely predict you will read more about some of these UDFs on this very blog later this year.

    Update 2022-04-28, late evening

    Today felt the most conferency so far for this Interop, and perhaps for any “virtual conference“ I've attended. I believe there's a technical reason for that. After the second proper motion-flavoured talk I've just mentioned – that was still using, sigh, zoom –, things mostly happened in gathertown, a platform you can actually walk around in, stand together and don't always talk on stage as in zoom. Fervently believing in the mantra of “protocols, not platforms” (of course: this is the VO), I shouldn't be saying this, but: I actually like gathertown.

    And so I guess we made quite a bit of progress in little side meetings and a hackathon on things like LineTAP (which, I hope, will bring all the rich data on spectral lines from VAMDC to the VO); how to let people have continuous integration checks against their Jupyter notebooks to notice in time when we're breaking something (my recent brown-bag pyvo bug that has somwhat started this was actually mentioned as a positive example in a talk (slide 19); and: it turned out I'm not the only notebook skeptic on this planet!); how we ought to define “facility” and “instrument“ in Obscore and the Registry (and, probably particularly insiduously, in SSAP, where what's called “facility“ there should probably be what's called “instrument“ elsewhere – sigh), a topic we already had touched yesterday, which in turn has resulted in Tamara's mail; an interesting service DaCHS operators want to run that would return PDF files as what DaCHS calls a “product” (which would normally be a thing like a FITS file); and then some more, including, of course, idle chatting.

    That was almost as good as an actual meeting.

    Update 2022-04-29, afternoon

    This morning, I chaired a nice and lively Semantics session, where I talked about the move of our Vocabulary maintenance to github. That particular thing did not elicit a lot of comments, not even when I extended an invitation to perhaps amend Vocabularies in the VO 2 in other weys. I'll take that as some sort of reassurance that I did a reasonably good job designing that thing, although I cannot entirely rule out that people just did not have enough time to find the warts.

    One thing I will call out at tonight's closing penary is Stéphane's talk on vocabularies in EPN-TAP. The way he was looking at the various word lists involved in that standard, looking at what “just works“, where the concepts are probably too special to worry about, and then the clumsy space in between – where there are or should be vocabularies that almost, but not quite fit – was exemplary. I'm looking forward to followups on the mailing lists, trying to work out where we can perhaps align different concept hierarchies so we spare implementors duplicate efforts. And figuring out where that's impossible, too expensive, or in other ways undesirable, and where the problems are. I suppose there's a lot to be learned from that.

    Another high point was the identification of Wikidata as a valuable resource for the never-ending story of creating identifiers for instruments and facilities in Baptiste's talk. There is some special gratification in making our activities matter beyond the VO, link our resources with the wider RDF world – and hack SPARQL.

    What's left for me is the Registry session, where I will briefly report, in particular, on my most recent effort of getting rid of my venerable GloTS service by adding a table of TAP-queriable tables to RegTAP. Let's see what people say – but in the end the challenge will be to convince the other operators of RegTAP services to take up the proposed changes. The central challenge there is that part of it is built on MOCs, and while the ESAC registry is built on Postgres that can already taught to deal with them, the one at MAST is based on SQLServer, which, I think, cannot yet. Let's see.

    Another thing I'm looking forward to is Hendrik's pitch for registring tutorials and similar educational material. I'd really like to see more stuff on VOTT, which is fed from such registrations.

    Update 2022-04-29, late evening

    Interops for me always have something of an ego trip when I see traces of my activities in other people's work. And I've just discovered such a trace in a place I had not expected it: Gilles' talk on extra metadata in service responses, where he showed metadata DaCHS returns with its TAP responses. This was in this morning's session of the Data Curation and Preservation interest group that, I have to admit, I skipped in favour of a proper breakfast without a screen in front of me.

    And he touched a topic that's dear to my heart, too. Really, I've been struggling to give applications enough metadata such that they can simply spit out a bunch of BibTeX for the sources used in a particular VO workflow for quite a while. In typcial DaCHS responses, you will find a bibcode and often a link to BibTeX (example), and at least the container element I got standardised in DALI 1.1. Let's see what else we can specify so that machines can reliably extract such information: Authors? Technical contact addresses? Date and time of production (could be very relevant for evolving data)? Full provenance? Well: If you've ever missed some piece of metadata, this would be a good time to bring it up.

    All that's left now is the reports of the Working Groups (which will be another midnight talk for me) and a bit of farewell ceremony. After that, I'll go to sleep, and so that's it for my Interop reporting.

  • The 2021 Southern Spring Interop

    A Venn diagram of product types that just doesn't work.

    A contribution for the ”things that didn't work out” (“Arbeiten, die zu keiner Lösung geführt haben”) section in our reports to BMBF: an attempt to systematise product types at the last Interop. I've made a new proposal at this Interop, and there is reason to hope it will fare better.

    Last night, the second IVOA Interop conference of 2021 came to an end; I'm calling it ”southern spring” because notionally, it happened in Cape Town, back to back with this year's ADASS. In reality, it was again an online event, and so, in keeping up with the tradition established in the pandemic times, the closing event was around midnight CET. I cannot say I will miss these late-night events, although I would not go as far as some people at the conference who quipped they'd prefer the airport security checks to having to sit through another zoom marathon.

    My contributions at this interop again had a clear focus on semantics, for instance with my public confession that my attempt to systematise “product types” at the last interop was entirely misguided; trying to force concepts like “time series”, “spectrum“ or “image” into a tree does not lead to anything that actually works for what this is intended to do, that is, helping people find the sort of data they are after for a particular purpose, or helping clients route data products to other clients better suited to process them. I will now try a restart using SKOS, a plan that was met with a lot more agreement than that previous attempt. Some entertainment at the side was provided by the realisation that a “time-image cube“ is normally called a movie. Next time I'll take in moving pictures, I'll find out what people say when I claim to investigate a time cube.

    Another talk that took up a topic from the last Interop's Semantics session was about making an IVOA vocabulary of object types based on the work done within the CDS over the last 40 year or so. This certainly is just the beginning of a longer effort, not the least because the current concepts severely fall short in the area of the solar system. But it's a start, and there's plenty of time to elaborate this before it will go through a review, presumably with the next version of Obscore.

    Also semantics-related, but over in the session of the Operations interest group, Mark Taylor reported on his activities to evaluate the standards adherence of semantics information in published tables. This activity is what had triggered me to make DaCHS validate UCDs assigned to columns in summer, something that I expect will result in quite few diagnostics when DaCHS operators upgrade to DaCHS 2.5 (expected for November). But that's fine: making it more likely that computers will actually recognise a, say, error in proper motion for what it is is undoubtedly a good thing. I'm therefore glad that there is almost a million “good” UCDs out there and a lot fewer somehow “bad”. I had expected much worse after my realisation that my own annotations left a lot to be desired in summer. By now, the only bad UCDs I'm still pushing out are the ones mandated by SSAP and SLAP. The contradictions between those standards and UCD are going to be addressed with Errata in the coming months.

    My talk in the third Apps session on Thursday afternoon still had some relationship with Semantics; it was a quick show and tell on the enhancements to WIRR I had reported on here in July, and it in particular showcased obtaining UCD constraints by full-text searching the rr.table_column table in my RegTAP service and the selection through UAT concepts. Satisfyingly in some way, it were these topics that people took up in the discussion after the talk. Less satisfyingly, people playing with the thing afterwards turned up something that has the alarming taste of a bug in the new MOC operations in pgsphere. Ouch.

    This segues into the realm of Registry, where there was no actual session but a rather well-attended side meeting in the gathertown instance we could take over from ADASS (that, incidentally, was substantially better attended than during the previous meetings). There, I mainly presented (and explained) my proposed changes to pyVO's registry interface currently living in a private branch in my fork on github. I will write a bit more on that around the time I will turn that into a PR.

    Another outcome of this was that there was some interest to turn the note on documents in the Registry – which is what feeds VOTT – into either an endorsed note or perhaps a Recommendation of the Registry WG.

    My fourth “proper” (in the rather twisted sense of: in a zoom session) talk was an attempt to finally do something about the problems pointed out in my caproles note lamenting that our current service registration patterns are fundamentally flawed. It proposed some ways to to get VOSI availability fixed, and the outcome was that we probably will drop what we currently require in that field, not the least because these requirements are cheerily ignored by 98% of the resources in the Registry.

    Those were again three fairly long days, usually starting with sessions around 7:00 CET and ending with sessions around midnight. Which is clearly not healthy. But on the other hand, it somehow does convey a physical sense of the global nature of the Virtual Observatory, on which people in many, many time zones work. And that, I have to say, still is something I do appreciate.

Page 1 / 3 »