Articles from Meetings

  • At the Malta Interop

    A bonze statue of a running man with a newspaper in his hand in front of a massive stone wall.

    The IVOA meets in Malta, which sports lots of walls and fortifications. And a “socialist martyr” boldly stepping forward (like the IVOA, of course): Manwel Dimech.

    It is Interop time again! Most people working on the Virtual Observatory are convened in Malta at the moment and will discuss the development and reality of our standards for the next two days. As usual, I will report here on my thoughts and the proceedings as I go along, even though it will be a fairly short meeting: In northen autumn, the Interop always is back-to-back with ADASS, which means that most participants already have 3½ days of intense meetings behind them and will probably be particularly glad when we will conclude the Interop Sunday noon.

    The TCG discusses (Thursday, 15:00)

    Right now, I am sitting in a session of the Technical Coordination Group, where the chairs and vice-chairs of the Working and Interest Groups meet and map out where they want to go and how it all will fit together. If you look at this meeting's agenda, you can probably guess that this is a roller coaster of tech and paperwork, quickly changing from extremely boring to extremely exciting.

    For me up to now, the discussion about whether or not we want LineTAP at all was the most relevant agenda item; while I do think VAMDC would win by taking up the modern IVOA TAP and Registry standards (VAMDC was forked from the VO in the late 2000s), takeup has been meagre so far, and so perhaps this is solving a problem that nobody feels[1]. I have frankly (almost) only started LineTAP to avoid a SLAP2 with an accompanying data model that would then compete with XSAMS, the data model below VAMDC.

    On the other hand: I think LineTAP works so much more nicely than VAMDC for its use case (identify spectral lines in a plot) that it would be a pity to simply bury it.

    By the way, if you want, you can follow the (public; the TCG meeting is closed) proceedings online; zoom links are available from the programme page. There will be recordings later.

    At the Exec Session (Thurday, 16:45)

    The IVOA's Exec is where the heads of the national projects meet, with the most noble task of endorsing our recommendations and otherwise providing a certain amount of governance. The agenda of Exec meetings is public, and so will the minutes be, but otherwise this again is a closed meeting so everyone feels comfortable speaking out. I certainly will not spill any secrets in this post, but rest assured that there are not many of those to begin with.

    That I am in here is because GAVO's actual head, Joachim, is not on Malta and could not make it for video participation, either. But then someone from GAVO ought to be here, if only because a year down the road, we will host the Interop: In the northern autumn of 2025, the ADASS and the Interop will take place in Görlitz (regular readers of this blog have heard of that town before), and so I see part of my role in this session in reconfirming that we are on it.

    Meanwhile, the next Interop – and determining places is also the Exec's job – will be in the beginning of June 2025 in College Park, Maryland. So much for avoiding flight shame for me (which I could for Malta that still is reachable by train and ferry, if not very easily).

    Opening Plenary (Friday 9:30)

    A lecture hall with people, a slide “The University of Malta” at the wall.

    Alessio welcomes the Interop crowd to the University of Malta.

    Interops always begin with a plenary with reports from the various functions: The chair of the Exec, the chair of the committee of science priorities, and chair of technical coordination group. Most importantly, though, the chairs of the working and interest groups report on what has happened in their groups in the past semester, and what they are planning for the Interop (“Charge to the working groups”).

    For me personally, the kind words during Simon's State of the IVOA report on my VO lecture (parts of which he has actually reused) were particularly welcome.

    But of course there was other good news in that talk. With my Registry grandmaster hat on, I was happy to learn that NOIRLabs has released a simple publishing registry implementation, and that ASVO's (i.e., Australia) large TAP server will finally be properly registered, too. The prize for the coolest image, though, goes to VO France and in particular their solar system folks, who have used TOPCAT to visualise data on a model of comet 67P Churyumov–Gerasimenko (PDF page 20).

    Self-Agency (Friday, 10:10)

    A slide with quite a bit of text.  Highlighted: “Dropped freq_min/max“

    I have to admit it's kind of silly to pick out this particular point from all the material discussed by the IG and WG chairs in the Charge to the Working Groups, but a part of why this job is so gratifying is experiences of self-agency. I just had one of these during the Radio IG report: They have dropped the duplication of spectral information in their proposed extension to obscore.

    Yay! I have lobbied for that one for a long time on grounds that if there is both em_min/em_max and f_min/f_max in an obscore records (which express the same thing, with em_X being wavelengths in metres, and f_X frequencies in… something else, where proposals included Hz, MHz and GHz), it is virtually certain that at least one pair is wrong. Most likely, both of them will be. I have actually created a UDF for ADQL queries to make that point. And now: Success!

    Focus Session: High Energy and Time Domain (Friday, 12:00)

    The first “working” session of the Interop is a plenary on High Energy and Time Domain, that is, instruments that look for messenger particles that may have the energy of a tennis ball, as well as ways to let everyone else know about them quickly.

    Incidentally, that “quickly” is a reason for why the two apparently unconnected topics share a session: Particles in the tennis ball range are fortunately rare (or our DNA would be in trouble), and so when you have found one, you might want make sure everone else gets to look whether something odd shows up where that particle came from in other messengers (as in: optical photons, say). This is also relevant because many detectors in that energy (and particle) range do not have a particularly good idea of where the signal came from, and followups in other wavelengths may help figuring out what sort of thing may have produced a signal.

    I enjoyed a slide by Jutta, who reported on VO publication of km3net data, that is, neutrinos detected in a large detector cube below the Mediterrenean sea, using the Earth as a filter:

    Screenshot of a slide: “What we do: Point source analysis, Alerts and follow-ups; What we don't do: Mission planning, Nice pictures.”

    “We don't do pretty pictures“ is of course a very cool thing one can say, although I bet this is not 120% honest. But I am willing to give Jutta quite a bit of slack; after all, km3net data is served through DaCHS, and I am still hopeful that we will use it to prototype serving more complex data products than just plain event lists in the future.

    A bit later in the session, an excellent question was raised by Judy Racusin in her talk on GCN:

    A talk slide, with highlighted text: “Big Question: Why hasn't this [VOEvent] continued to serve the needs of various transient astrophysics communities?”

    The background of the question is that there is a rather reasonable standard for the dissemination of alerts and similar data, VOEvent. This has seen quite a bit of takeup in the 2000s, but, as evinced by page 17 of Judy's slides, all the current large time-domain projects decided to invent something new, and it seems each one invented something different.

    I don't have an definitive answer to why and how that happened (as opposed to, for instance, everyone cooperating on evolving VOEvent to match whatever requirements these projects have), although outside pressures (e.g., the rise of Apache Avro and Kafka) certainly played a role.

    I will, however, say that I strongly suspect that if the VOEvent community back then had had more public and registered streams consumed by standard software, it would have been a lot harder for these new projects to (essentially) ignore it. I'd suggest as a lesson to learn from that: make sure your infrastructure is public and widely consumed as early as you can. That ought to help a lot in ensuring that your standard(s) will live long and prosper.

    In Apps I (Friday 16:30)

    I am now in the Apps session. This is the most show-and-telly event you will get at an Interop, with largest likelihood of encountering the pretty pictures that Jutta had flamboyantly expressed disinterest in this morning. In the first talk already, Thomas delivers with, for instance, mystic pictures from Mars:

    A photo of Olympus Mons on Mars with overplotted contour lines.

    Most of the magic was shown in a live demo; once the recordings are online, consider checking this one out (I'll mention in passing that HiPS2MOC looks like a very useful feature, too).

    My talk, in contrast, had extremely boring slides; you're not missing out at all by simply reading the notes. The message is not overly nice, either: Rather do fewer features than optional ones, as a server operator please take up new standards as quickly as you can, and in the same role please provide good metadata. This last point happened to be a central message in Henrik's talk on ESASky (which aptly followed mine) as well, that, like Thomas', featured a live performance of eye candy.

    Mario Juric's talk on something called HATS then featured this nice plot:

    A presentation slide headed “partition hierarchically“, with all-sky heatmap featuring pixels of varying size.

    That's Gaia's source catalogue pixelated such that the sources in each pixel require about a constant processing time. The underlying idea, hierarchical tiling, is great and has proved itself extremely capable not only with HiPS, which is what is behind basically anything in the VO that lets you smoothly zoom, in particular Aladin's maps. HATS' basic premise seems to be to put tables (rather than JPEGs or FITS images as usual) into a HiPS structure. That has been done before, as with the catalogue HiPSes; Aladin users will remember the Gaia or Simbad layers. HATS, now, stores Parquet files, provides Pandas-like interfaces on top of them, and in particular has the nice property of handing out data chunks of roughly equal size.

    That is certainly great, in particular for the humongous data sets that Rubin (née LSST) will produce. But I wondered how well it will stand up when you want to combine different data collections of this sort. The good news: they have already tried it, and they even have thought about how pack HATS' API behind a TAP/ADQL interface. Excellent!

    Further great news in Brigitta's talk [warning: link to google]: It seems you can now store ipython (“Jupyter”) notebooks in, ah well, Markdown – at least in something that seems version-controllable. Note to self: look at that.

    Data Access Layer (Saturday 9:30)

    I am now sitting in the first session of the Data Access Layer Working Group. This is where we talk about the evolution of the protocols you will use if you “use the VO”: TAP, SIAP, and their ilk.

    Right at the start, Anastasia Laity spoke about a topic that has given me quite a bit of headache several times already: How do you tell simulated data from actual observations when you have just discovered a resource that looks relevant to your research?

    There is prior art for that in that SSAP has a data source metadata item on complete services, with values survey, pointed, custom, theory, or artificial (see also SimpleDALRegExt sect. 3.3, where the operational part of this is specified). But that's SSAP only. Should we have a place for that in registry records in general? Or even at the dataset level? This seems rather related to the recent addition of productTypeServed in the brand-new VODataService 1.3. Perhaps it's time for dataSource element in VODataService?

    A large part of the session was taken up by the question of persistent TAP uploads that I have covered here recently. I have summarised this in the session, and after that, people from ESAC (who have built their machinery on top of VOSpace) and CADC (who have inspired my implementation) gave their takes on the topic of persistent uploads. I'm trying hard to like ESAC's solution, because it is using the obvious VO standard for users to manage server-side resources (even though the screenshot in the slides,

    A cutout of a presentation slide showing a browser screenshot with a modal diaglog with a progress bar for an upload.

    suggests it's just a web page). But then it is an order of magnitude more complex in implementation than my proposal, and the main advantage would be that people can share their tables with other users. Is that a use case important enough to justify that significant effort?

    Then Pat's talk on CADC's perspective presented a hierarchy of use cases, which perhaps offers a way to reconcile most of the opinions: Is there is a point for having the same API on /tables and /user_tables, depending on whether we want the tables to be publicly visible?

    Data Curation and Preservation (Saturday, 11:15)

    This Interest Group's name sounds like something only a librarian could become agitated about: Data curation and preservation. Yawn.

    Fortunately, I am considering myself a librarian at heart, and hence I am participating in the DCP session now. In terms of engagement, we have already started to quarrel about a topic that must seem rather like bikeshedding from the outside: should we bake in the DOI resolver into the way we write DOIs (like http://doi.org/10.21938/puTViqDkMGcQZu8LSDZ5Sg; actually, since a few years: https instead of http?) or should we continue to use the doi URI scheme, as we do now: doi:10.21938/puTViqDkMGcQZu8LSDZ5Sg?

    This discussion came up because the doi foundation asks you to render DOIs in an actionable way, which some people understand as them asking people to write DOIs with their resolver baked in. Now, I am somewhat reluctant to do that mainly on grounds of user freedom. Sure, as long as you consider the whole identifier an opaque string, their resolver is not actually implied, but that's largely ficticious, as evinced by the fact that somehow identifiers with http and with https would generally be considered equivalent. I do claim that we should make it clear that alternative resolvers are totally an option. Including ours: RegTAP lets you resolve DOIs to ivoids and VOResource metadata, which to me sounds like something you might absolutely want to do.

    Another (similarly biased) point: Not everything on the internet is http. There are other identifier types that are resolvable (including ivoids). Fortunately, writing DOIs as HTTP URIs is not actually what the doi foundation is asking you to do. Thanks to Gus for clarifying that.

    These kinds of questions also turned up in the discussion after my talk on BibVO. Among other things, that draft standard proposes to deliver information on what datasets a paper used or produced in a very simple JSON format. That parsimony has been put into question, and in the end the question is: do we want to make our protocols a bit more complicated to enable interoperability with other “things”, probably from outside of astronomy? Me, I'm not sure in this case: I consider all of BibVO some sort of contract essentially between the IVOA and SciX (née ADS), and I doubt that someone else than SciX will even want to read this or has use for it.

    But then I (and others) have been wrong with preditions like this before.

    Registry (Saturday 14:30)

    Now it's registry time, which for me is always a special time; I have worked a lot on the Registry, and I still do.

    Given that, in Christophe's statistics talk, I was totally blown away by the number of authorities and registries from Germany, given how small GAVO is. Oh wow. In this graph of authorities in the VO we are the dark green slice far at the bottom of the pie:

    A presentation slide with two pie charts.  In the larger one, there are man small and a couple of large slices.  A dark green one makes up a bit less than 10%.

    I will give you that, as usual with metrics, to understand what they mean you have to know so much that you then don't need the metrics any more. But again there is an odd feeling of self-agency in that slide.

    The next talk, Robert Nikutta's announcement of generic publishing registry code, was – as already mentioned above – particularly good news for me, because it let me add something particularly straightforward into my overview of OAI-PMH servers for VO use, and many data providers (those unwise enough to not use DaCHS…) have asked for that.

    For the rest of the session I entertained folks with the upcoming RFC of VOResource 1.2 and the somewhat sad state of affairs in fulltext seaches in the VO. Hence, I was too busy to report on how gracefully the speaker made his points. Ahem.

    Semantics and Solar System (Saturday 16:30)

    Ha! A session in which I don't talk. That's even more remarkable because I'm the chair emeritus of the Semantics WG and the vice-chair of the Solar Systems IG at the moment.

    Nevertheless, my plan has been to sit back and relax. Except that some of Baptiste's proposals for the evolution of the IVOA voacabularies are rather controversial. I was therefore too busy to add to this post again.

    But at least there is hope to get rid of the ugly “(Obscure)” as the human-readable label of the geo_app reference frame that entered that vocabulary via VOTable; you see, this term was allowed in COOSYS/@system since VOTable 1.0, but when we wrote the vocabulary, nobody who reviewed it could remember what it meant. In this session, JJ finally remembered. Ha! This will be a VEP soon.

    It was also oddly gratifying to read this slide from Stéphane's talk on fetching data from PDS4:

    A presentation slide with bullet points complaining about missing metadata, inconsistent metadata, and other unpleasantries.

    Lists like these are rather characteristic in a data publisher's diary. Of course, I know that's true. But seeing it in public is still gives me a warm feeling of comradeship. Stéphane then went on to tell us how to make the cool 67P images in TOPCAT (I had already mentioned those above when I talked about the Exec report):

    A 3D-plot of an odd shape with colours indicating some physical quantity.

    Operations (Sunday 10.00)

    I am now in the session of the Operations IG, where Henrik is giving the usual VO Weather Report. VO weather reports discuss how many of our services are “valid” in the sense of “will work reasonably well with our clients“. As usual for these kinds of metrics, you need to know quite a bit to understand what's going on and how bad it is when a service is “not compliant”. In particular for the TAP stats, things look a lot bleaker than they actually are:

    A bar graph showing the temporal evolution of the number of TAP servers failing (red), just passing (yellow) or passing (green) validation over the past year or so.  Yellow is king.

    Green is “fully compliant”, yellow is “mostly compliant”, red is “not compliant”. For whatever that means.

    These assessments are based on stilts taplint, which is really fussy (and rightly so). In reality, you can usually use even the red services without noticing something is wrong. Except… if you are not doing things quite right yourself.

    That was the topic of my talk for Ops. It is another outcome of this summer semester's VO course, where students were regularly confused by diagnostics they got back. Of course, while on the learning curve, you will see more such messages than if you are a researcher who is just gently adapting some sample code. But anyway: Producing good error messages is both hard and important. Let me quote my faux quotes in the talk:

    Writing good error messages is great art: Do not claim more than you know, but state enough so users can guess how to fix it.

    —Demleitner's first observation on error messages

    Making a computer do the right thing for a good request usually is not easy. It is much harder to make it respond to a bad request with a good error message.

    —Demleitner's first corollary on error messages

    Later in the session there was much discussion about “denial of service attacks” that services occasionally face. For us, that does not seem to be malicious people in general, but people basically well-meaning but challenged to do the right thing (read documentation, figure out efficient ways to do what they want to do).

    For instance, while far below DoS, turnitin.com was for a while harvesting all VO registry records from some custom, HTML-rendering endpoint every few days, firing off 30'000 requests relatively expensive on my side (admittedly because I have implemented that particular endpoint in the most lazy fashion imaginable) in a rather short time. They could have done the same thing using OAI-PMH with a single request that, no top, would have taken up almost no CPU on my side. For the record, it seems someone at turnitin.com has seen the light; at least they don't do that mass harvesting any more for all I can tell (without actually checking the logs). Still, with a single computer, it is not hard to bring down your average VO server, even if you don't plan to.

    Operators that are going into “the cloud” (which is a thinly disguised euphemism for “volunatrily becoming hostages of amazon.com”) or that are severely “encouraged” to do that by their funding agencies have the additional problem in that for them, indiscriminate downloads might quickly become extremely costly on top. Hence, we were talking a bit about mitigations, from HTTP 429 status codes (”too many requests“) to going for various forms of authentication, in particular handing out API keys. Oh, sigh. It would really suck if people ended up needing to get and manage keys for all the major services. Perhaps we should have VO-wide API keys? I already have a plan for how we could pull that off…

    Winding down (Monday 7:30)

    The Interop concluded yesterday noon with reports from the working groups and another (short) one from the Exec chair. Phewy. It's been a straining week ever since ADASS' welcome reception almost exactly a week earlier.

    Reviewing what I have written here, I notice I have not even mentioned a topic that pervaded several sessions and many of the chats on the corridors: The P3T, which expands to “Protocol Transition Tiger Team”.

    This was an informal working group that was formed because some adopters of our standards felt that they (um: the standards) are showing their age, in particular because of the wide use of XML and because they do not always play well with “modern” (i.e., web browser-based) “security” techniques, which of course mostly gyrate around preventing cross-site information disclosure.

    I have to admit that I cannot get too hung up on both points; I think browser-based clients should be the exception rather than the norm in particular if you have secrets to keep, and many of the “modern” mitigations are little more than ugly hacks (“pre-flight check“) resulting from the abuse of a system designed to distribute information (the WWW) as an execution platform. But then this ship has sailed for now, and so I recognise that we may need to think a bit about some forms of XSS mitigations. I would still say we ought to find ways that don't blow up all the sane parts of the VO for that slightly insane one.

    On the format question, let me remark that XML is not only well-thought out (which is not surprising given its designers had the long history of SGML to learn from) but also here to stay; developers will have to handle XML regardless of what our protocols do. More to the point, it often seems to me that people who say “JSON is so much simpler” often mean “But it's so much simpler if my web page only talks to my backend”.

    Which is true, but that's because then you don't need to be interoperable and hence don't have to bother with metadata for other peoples' purposes. But that interoperability is what the IVOA is about. If you were to write the S-expressions that XML encodes at its base in JSON, it would be just as complex, just a bit more complicated because you would be lacking some of XML's goodies from CDATA sections to comments.

    Be that as it may, the P3T turned out to do something useful: It tried to write OpenAPI specifications for some of our protocols, and already because that smoked out some points I would consider misfeatures (case-insensitive parameter names for starters), that was certainly a worthwhile effort. That, as some people pointed out, you can generate code from OpenAPI is, I think, not terribly valuable: What code that generates probably shouldn't be written in the first place and rather be replaced by some declarative input (such as, cough, OpenAPI) to a program.

    But I will say that I expect OpenAPI specs to be a great help to validators, and possibly also to implementors because they give some implementation requirements in a fairly concise and standard form.

    In that sense: P3T was not a bad thing. Let's see what comes out of it now that, as Janet also reported in the closing session, the tiger is sleeping:

    A presentation slide with a sleeping tiger and the proclamation that ”We feel the P3T has done its job”.
    [1]“feels” as opposed to “has”, that is. I do still think that many people would be happy if they could say something like: “I'm interested in species A, B, and C at temperature T (and perhaps pressure p). Now let me zoom into a spectrum and show me lines from these species; make it so the lines don't crowd too much and select those that are plausibly the strongest with this physics.”
  • GAVO at the AG-Tagung in Köln

    People standing an sitting around a booth-like table.  There's a big GAVO logo and a big screen on the left-hand side, a guy in a red hoodie is clearly giving a demo.

    As every year, GAVO participates in the fall meeting of the Astronomische Gesellschaft (AG), the association of astronomers working in Germany. This year, the meeting is hosted by the Universität zu Köln (a.k.a. University of Cologne), and I want to start with thanking them and the AG staff for placing our traditional booth smack next to a coffee break table. I anticipate with glee our opportunities to run our pitches on how much everyone is missing out if they're not doing VO while people are queueing up for coffee. Excellent.

    As every year, we are co-conveners for a splinter meeting on e-science the virtual observatory, where I will be giving a talk on global dataset discovery (you heard it here first; lecture notes for the talk) late on Thursday afternoon.

    And as every year, there is a puzzler, a little problem rather easily solvable using VO tools; I was delighted to see people apparently already waiting for it when I handed out the problem sheet during the welcome reception tonight. You are very welcome to try your hand on it, but you only get to enter our raffle if you are on site. This year, the prize is a towel (of course) featuring a great image from ESA's Mars Express mission, where Phobos floats in front of Mars' limb:

    A 2:1 landscape black-and-white image with a blackish irregular spheroid floating in front of a deep horizon.

    I will update this post with the hints we are going to give out during the coffee breaks tomorrow and on Wednesday. And I will post our solution here late on Thursday.

    At our booth, you will also find various propaganda material, mostly covering matters I have mentioned here before; for posteriority and remoteriority, let me link to PDFs of the flyers/posters I have made for this meeting (with re-usabilty in mind). To advertise the new VO lectures, I am asking Have you ever wished there was a proper introduction to using the Virtual Observatory? with lots of cool DOIs and perhaps less-cool QR codes. Another flyer trying to gain street cred with QR codes is the Follow us flyer advertising our Fediverse presence. We also still show a pitch for publishing with us and hand out the inevitable who we are flyer (which, I'll readily admit, has never been an easy sell).

    A fediverse screenshot and URIs for following us.

    Bonferroni for Open Data?

    A lot more feedback than on the QR code-heavy posters I got on a real classic that I have shown at many AG meetings since the 2013 Tübingen meeting: Lame excuses for not publishing data.

    A tricky piece of feedback on that was an excuse that may actually be a (marginally) valid criticism of open data in general. You see, in particular in astroparticle physics (where folks are usually particularly uptight with their data), people run elaborate statistics on their results, inspired by the sort of statistics they do in high energy physics (“this is a 5-sigma detection of the Higgs particle”). When you do this kind of thing, you do run into a problem when people run new “tests” against your data because of the way test theory works. If you are actually talking about significance levels, you would have to apply Bonferroni corrections (or worse) when you do new tests on old data.

    This is actually at least not untrue. If you do not account for the slight abuse of data and tests of this sort, the usual interpretation of the significance level – more or less the probablity that you will reject a true null hypothesis and thus claim a spurious result – breaks down, and you can no longer claim things like “aw, at my significance level of 0.05, I'll do spurious claims only one out of twenty times tops”.

    Is this something people opening their data would need to worry about when they do their original analysis? It seems obvious to me that that's not the case and it would actually be impossible to do, in particular given that there is no way to predict what people will do in the future. But then there are many non-obvious results in statistics going against at least my gut feelings.

    Mind you, this definitely does not apply to most astronomical research and data re-use I have seen. But the point did make me wonder whether we may actually need some more elaborate test theory for re-used open data. If you know about anything like that: please do let me know.

    Followup (2024-09-10)

    The first hint is out. It's “Try TOPCAT's TAP client to solve this puzzler; you may want to took for 2MASS XSC there.“ Oh, and we noticed that the problem was stated rather awkwardly in the original puzzler, which is why we have issued an erratum. The online version is fixed, it now says “where we define obscure as covered by a circle of four J-magnitude half-light radii around an extended object”.

    Followup (2024-09-10)

    After our first splinter – with lively discussions on the concept and viability of the “science-ready data” we have always had in mind as the primary sort of thing you would discover in the VO –, I have revealed the second hint: “TOPCAT's Examples button is always a good idea, in particular if you are not too proficient in ADQL. What you would need here is known as a Cone Selection.”

    Oh, in case you are curious where the discussion on the science-ready data gyrated to: Well, while the plan for supplying data usable without having to have reduction pipelines in place is a good one. However, there undoubtedly are cases in which transparent provenance and the ability to do one's own re-reductions enable important science. With datalink [I am linking to a 2015 poster on that written by me; don't read that spec just for fun], we have an important ingredient for that. But I give you that in particular the preservation of the software that makes up reduction pipelines is a hard problem. It may even be an impossible problem if “preservation” is supposed to encompass malleability and fixability.

    Followup (2024-09-11)

    I've given the last two hints today: “To find the column with the J half-light radius, it pays to sort the columns in the Columns tab in TOPCAT by name or, for experts using VizieR's version of the XSC, by UCD.” and “ADQL has aggregate functions, which let you avoid downloading a lot of data when all you need are summary properties. This may not matter with what little data you would transfer here, but still: use server-side SUM.”

    Followup (2024-09-12)

    I have published the (to me, physically surprising) puzzler solution to https://www.g-vo.org/puzzlerweb/puzzler2024-solution.pdf. In case it matters to you: The towel went to Marburg again. Congratulations to the winner!

    Followup (2024-09-13)

    On the way home I notice this might be a suitable place to say how I did the QR codes I was joking about above. Basis: The embedding documents are written in LaTeX, and I'm using make to build them. To include a QR code, I am writing something like:

    \includegraphics[height=5cm]{vo-qr.png}}
    

    in the LaTeX source, and I am declaring a dependency on that file in the makefile:

    fluggi.pdf: fluggi.tex vo-qr.png <and possibly more images>
    

    Of course, this will error out because there is no file vo-qr.png at that point. The plan is to programatically generate it from a file containing the URL (or whatever you want to put into the QR code), named, in this case, vo.url (that is, whatever is in front of -qr.png in the image name). In this case, this has:

    https://doi.org/10.21938/avVAxDlGOiu0Byv7NOZCsQ
    

    The automatic image generation then is effected by a pattern rule in the makefile:

    %-qr.png: %.url
            python qrmake.py $<
    

    And then all it takes is a showrt script qrmake.py, which based on python3-qrcode:

    import sys
    import qrcode
    
    with open(sys.argv[1], "rb") as f:
            content = f.read().strip()
    output_code = qrcode.QRCode(border=0)
    output_code.add_data(content)
    
    dest_name = sys.argv[1].replace(".url", "")+"-qr.png"
    output_code.make_image().save(dest_name)
    
  • Multimessenger Astronomy and the Virtual Observatory

    A lake with hills behind it; in it a sign saying “Bergbaugelände/ Benutzung auf eigene Gefahr”

    It's pretty in Görlitz, the location of the future German Astrophysics Research Centre DZA. The sign says “Mining area, enter at your own risk”. Indeed, the meeting this post was inspired by happened on the shores of a lake that still was an active brown coal mine as late as 1997.

    This week, I participated in the first workshop on multimessenger astronomy organised by the new DZA (Deutsches Zentrum für Astrophysik), recently founded in the town of Görlitz – do not feel bad if you have not yet heard of it; I trust you will read its name in many an astronomy article's authors' affiliations in the future, though.

    I went there because facilitating research across the electromagnetic spectrum and beyond (neutrinos, recently gravitational waves, eventually charged particles, too) has been one of the Virtual Observatory's foundational narratives (see also this 2004 paper from GAVO's infancy), and indeed the ease with which you can switch between wavebands in, say, Aladin, would have appeared utopian two decades ago:

    A screenshot with four panes showing astronomical images from SCUBA, allWISE, PanSTARRS, XMM, and a few aladin widgets around it.

    That's the classical quasar 3C 273 in radio, mid-infrared, optical, and X-rays, visualised within a few seconds thanks to the miracles of HiPS and Aladin.

    But of course research-level exploitation of astronomical data is far from a solved problem yet. Each messenger – and I would consider the concepts in the IVOA's messenger vocabulary a useful working definition for what sorts of messengers there are[1] – holds different challenges, has different communities using different detectors, tools and conventions.

    For instance, in the radio band, working with raw-ish interferometry data (“visibilities”) is common and requires specialised tools as well as a lot of care and experience. Against that, high energy observations, be them TeV photons or neutrinos, have to cope with the (by optical standards) extreme scarcity of the messengers: at the meeting, ESO's Xavier Rodrigues (unless I misunderstood him) counted one event per year as viable for source detection. To robustly interpret such extremely low signal levels one in particular needs extremely careful modelling of the entire observation, from emission to propagation through various media to background contamination to the instrument state, with a lot of calibration and simulation data frequently necessary to make statistical sense of even fairly benign data.

    The detectors for graviational waves, in turn, basically only match patterns in what looks like noise even to the aided eye – at the meeting, Samaya Nissanke showed impressive examples –, and when they do pick up a signal, the localisation of the signal is a particular challenge resulting, at least at this point, in large, banana-shaped regions.

    At the multimessenger workshop, I was given the opportunity to delineate what, from my Virtual Observatory point of view, I think are requirements for making multi-messenger astronomy more accessible “for the masses”, that is for researchers that do not have immdiate access to experts for a particular sort of messenger. Since a panel pitch is always a bit cramped, let me give a long version here.

    Science-Ready is an Effort

    The most important ingredient is: Science-ready data. Once you can say “we get a flux of X ± Y Janskys from a σ-circle around α, δ between T1 and T2 and messenger energy E1 and E2” or “here is a spectrum, i.e., pairs of sufficiently many messenger energy intervals and a calibrated flux in them, for source S”, matters are at least roughly understandable to visitors from other parts of the spectrum.

    I will not deny that there is still much that can go wrong, for instance because the error models of the data can become really tricky for complex instruments doing indirect measurements (say, gamma-ray telescopes observing atmospheric showers). But having to cope with weirdly correlated errors or strong systematics is something that happens even while staying within your home within the spectrum – I had an example from the quaint optical domain right here on my blog when I posted on the Gaia XP spectra –, so that is not a problem terribly specific to the multi-messenger setting.

    Still, the case of the Gaia XP spectra and the sampling procedure Rene has devised back then are, I think, a nice example for what “provide science-ready data” might concretely mean: work, in this case, trying to de-correlate data points so people unfamiliar with the particular formalism used in Gaia DR3 can do something with the data with low effort. And I will readily admit that it is not only work, it also sacrifices quite a bit of detail that may actually be in the data if you spend more time with the individual dataset and methods.

    That this kind of service to people outside of the narrower sub-domain is rarely honoured certainly is one of the challenges of multi-messenger astronomy for the masses.

    Generality, Systematics, Statistics

    But of course the most important part of “science-ready” is removing instrument signatures. That is particularly important in multi-messenger astronomy because outside users will generally be fairly unfamiliar with the instruments, even with the types of instruments. Granted, even within a sub-domain setting up reduction pipelines and locating calibration data is rarely easy, and it is not uncommon to get three different answers when you ask two instrument specialists about the right formalism and data to calibrate any given observation. But that is not much compared with having to understand the reduction process of, say, LIGO, as someone who has so far mainly divided by flatfields.

    Even in the optical, serving data with strong instrumental signatures (e.g., without flats and bias frames applied) has been standard until fairly recently. Many people in the VLBI community still claim that real-space data is not much good. And I will not dispute that carefully analysing the systematics of a particular dataset may improve your error budget over what a generic pipeline does, possibly even to the point of pushing an observation over the significance threshold.

    But against that, canned science-ready data lets non-experts at least “see” something. That way, they learn that there may be some signal conveyed by a foreign messenger that is worth a closer look.

    Enabling that “closer look” brings me to my second requirement for multimessenger astronomy: expert access.

    From Data Discovery to Expert Discovery

    Of course, on the forefront of research, an extra 10% systematics squeezed out of data may very well make or break a result, and that means that people may need to go back to raw(er) data. Part of this problem is that the necessary artefacts for doing so need to be available. With Datalink, I'd say at least an important building block for enabling that is there.

    Certainly, that is not full provenance information yet – that would, for instance, include references to the tools used in the reduction, and the parameters fed to them. And regrettably, even the IVOA's provenance data model does not really tell you how to provide that. However, even machine-readable provenance will not let an outsider suddenly do, say, correlation with CASA with sufficient confidence to do bleeding-edge science with the result, let alone improve on the generic reduction hopefully provided by the observatory.

    This is the reason for my conviction that there is an important social problem with multi-messenger astronomy: Assuming I have found some interesting data in unfamiliar spectral territories and I want to try and improve on the generic reduction, how do I find someone who can work all the tools and actually know what they are doing?

    Sure, from registry records you can find contact information (see also the .get_contact() in pyVO's registry API), but that is most often a technical contact, and the original authors may very well have moved on and be inaccessible to these technical contacts. I, certainly, have failed to re-establish contact to previous data providers to the GAVO data centre in two separate cases.

    And yes, you can rather easily move to scholarly publications from VO results – in particular if they implement the INFO elements that the new Data Origin in the VO note asks for–, but that may not help either when the authors have moved on to a different institution, regardless of whether that is a scholarly or, say, banking institution.

    On top of that, our notorious 2013 poster on lame excuses for not publishing one's data has, as an excuse: “People will contact me and ask about stuff.” Back then, we flippantly retorted:

    Well, science is about exchange. Think how much you learned by asking other people.

    Plus, you’ll notice that quite a few of those questions are actually quite clever, so answering them is a good use of your time.

    As to the stupid questions – well, they are annoying, but at least for us even those were eye-openers now and then.

    Admittedly, all this is not very helpful, in particular if you are on the requesting side. And truly I doubt there is a (full) technical solution to this problem.

    I also acknowledge that it even has a legal side – the sort of data you need to process when linking up sub-domain experts and would-be data users is GDPR-relevant, and I would much rather not have that kind of thing on my machine. Still, the problem of expert discovery becomes very pertinent whenever a researcher leaves their home turf – it's even more important in cross-discipline data discovery[2] than in multiwavelength. I would therefore advocate at least keeping the problem in mind, as that might yield little steps towards making expert discovery a bit more realistic.

    Perhaps even just planning for friendly and welcoming helpdesks that link people up without any data processing support at all is already good enough?

    Blind Discovery

    The last requirement I have mentioned in my panel discussion pitch for smooth multi-messenger astronomy is, I think, quite a bit further along: Blind discovery. That is, you name your location(s) in space, time, and spectrum, say what sort of data product you are looking for, and let the computer inundate you with datasets matching these constraints. I have recently posted on this very topic and mentioned a few remaining problems in that field.

    Let me pick out one point in particular, both because I believe there is substantial scientific merit in its treatment and because it is critical when it comes to efficient global blind discovery: Sensitivity; while for single-object spectra, I give you that SNR and resolving power are probably enough most of the time, for most other data products or even catalogues, nothing as handy is available across the spectrum.

    For instance, on images (“flux maps”, if you will) the simple concept of a limiting magnitude obviously does not extend across the spectrum without horrible contortions. Replacing it with something that works for many messengers, has robust and accessible statistical interpretations, and is reasonably easy to obtain as part of reduction pipelines even in the case of strongly model-driven data reduction: that would be high art.

    Also in the panel discussion, it was mentioned that infrastructure work as discussed on these pages is thankless art that will, if your institute indulges into too much of it, get your shop closed because your papers/FTE metric looks too lousy. Now… it's true that beancounters are a bane, but if you manage to come up with such a robust, principled, easy-to-obtain measure, I fully expect its wide adoption – and then you never have to worry about your bean^W citation count again.

    [1]In case you are missing gravitational waves: there has been a discussion about the proper extension and labelling of that concept, and it has petered out the first time around. If you miss them operationally (which will give us important hints about how to properly include them), please to contact me or the IVOA Semantics WG.
    [2]Let me use this opportunity to again solicit contributions to my Stories on Cross-Discipline Data Discovery, as – to my chagrin – it seems to me that beyond metrics (which, between disciplines, are even more broken than within any one discipline) we lack convincing ideas why we should strive for data discovery spanning multiple disciplines in the first place.
  • GAVO at the Fall 2023 Interop in Tucson

    The Virtual Observatory, in practical terms, is the set of standards created and maintained by the IVOA. The IVOA, in turn, is a community almost defined by the two conferences it holds every year, the Interops (previously on this blog). The most recent Interop has just ended: The 2023 Tucson Fall Interop. Here are a few notes on what went on there from my (and to some extent GAVO's) perspective.

    A almost-orange orange haging in a tree.

    This fall's IVOA Interop was hosted by Steward Observatory, where they had ripening oranges in the backyard. They were edible!

    For at least a decade and a half, the autumn Interops have been back-to-back with the ADASS conferences. ADASS, short for Astronomical Data Analysis Software and Systems, is a venerable conference series, created far in the last century (this year: ADASS XXXIII) to have a forum for people who work in the magic triangle of astronomy, instrumentation, and data processing. Clearly, such a forum is very well suited to spread the word about the miracles we are working in the VO.

    To that end, I was involved in the creation of three posters: One on the use of MOCs in TAP – a somewhat extended version of something you saw on this blog first –, then one on data discovery in pyVO by Renaud Savalle (Paris) et al – a topic again familiar to readers of this blog – and finally one on improving the description of ADQL to enable more reliable machine validation of its grammar by Grégory Mantelet (Strasbourg) et al.

    As the conference at large goes, I was really delighted to see how basically everyone talking about data publication at all was stressing they are “doing VO”, which was a very welcome change from, perhaps, 10 years ago when this kind of talk was typcially extolling the virtues of one particular web or javascript framework. One of the great thing about standards in general and the VO in particular is that they tend to be a lot more durable than all those frameworks.


    The following Interop was a “short” one, lasting from Friday morning until Sunday noon, which meant that I was far too busy to do anything like a live blog while it went on. Let me hence just briefly point out the main talks related to GAVO's current activities and DaCHS.

    In Data Curation and Preservation on Saturday morning, Baptiste Cecconi (Paris) gave a nice overview of – among other things – what our bridge between the Registry and b2find (in particular, using the VOResource to DataCite mapper) enables in the context of the EOSC, and he briefly touched the question of how to properly make landing pages for VO resources (for which I am currently using another piece of XSLT).

    In the Radio session later that morning, Ixaka Labadie (Granada) gave a talk on how he is using DaCHS to deliver 3D visualisations for fairly impressive (prototype) SKA data. I particularly liked his illustrations of how DaCHS does Datalink and SODA. See his slide 12:

    Boxes and arrows illustrating how SIAP and Datalink are described in DaCHS resource descriptors

    In the afternoon, there was the Registry session, which featured me talking about the harvest trigger service I have been running for a while to help people across the anticlimactic moment when you have published your new resource but it won't show up in TOPCAT or pyVO for a day or so.

    The bulk of this session, however, was used for a discussion about various shortcomings of the Registry or its interfaces that I found pleasantly productive – incidentally, just like the discussion on word lists in EPN-TAP on Friday afternoon's Solar System Session that I had the pleasure to chair.

    In the DAL session on that afternoon, I had two talks: One was on the proposed new interoperable user-defined functions already implemented in DaCHS' ADQL and now coming up in several other services, too. Note to self: Some of these would probably be rather suitable blog post material.

    The second talk was a sort of brief show-and-tell pitch, in which I pointed out that hierarchical TAP examples using the elegant examples:continued property now actually work in both pyVO and TOPCAT:

    A three-level popup menu Service Provided -> Local UDFs -> using ivo_histogram

    Finally, in Sunday morning's Apps session, I talked about global image discovery in pyVO. This was about an early promise of the VO: just say where in space, time, and spectrum you need an image (or spectrum, or time series, or whatever), and some apparatus will find and query all the services that could have pertinent data. It would then present the metadata of the datasets it found in some useful form that would let you make informed decisions which to fetch.

    This was not too difficult in the olden days, but by now the VO is so big and complicated that a pyVO module with fairly involved logic is required. If you don't want to read the notes here, don't worry: I can safely predict that you'll read more about that topic on this blog.

    This is nowhere near done yet; so, it is one more piece of homework that I am taking home with me.

  • GAVO at the AG-Tagung in Berlin

    A booth with a large screen, quite a bit of papers, a roll-up, all behind a glass wall with a sign UNI_VERSUM TUB Exhibition Space.

    It's time again for the annual meeting of the German astronomical society, the Astronomische Gesellschaft. Since we have been reaching out to the community at these meetings there since 2007, there is even a tag for our contributions there on this blog: AG-Tagung.

    Due to fire codes, our traditional booth would almost have ended up in a remote location on the third floor of TU Berlin's main building, and I had already printed desperate pleas to come and try find us. But in a last minute stunt, the local organisers housed us in an almost perfect place (thanks!): we're sitting right near the entrance, where we can rope in passers-by and then convince them they're missing out if they're not “doing VO”.

    One opportunity for them to realise how they're missing out is our puzzler, this year about a lonely O star:

    An overexposed star in a PanSTARRS field with an arrow plotted over it.

    Since this star must have formed very (by astronomical standards) recently, it should still be in its nursery, something like a nebula – but it clearly is not. It's a runaway. But from what?

    Contrary to last year, we will not accept remote entries, sorry – but you're welcome to still try your hand even if you are not in Berlin. Also, if you like the format, there's quite a few puzzlers from previous years to play with.

    I have just (11:30) revealed the first hint towards our sample solution:

    We recommend solving this puzzler using Aladin. There, you can look for services serving, e.g., the Gaia DR3 data in the little “select” box in in the lower left corner. Shameless plug: Try dr3lite.

    If you are on-site: drop by our booth. If not: we will post updates – in particular on the puzzler – here.

    Followup (2023-09-13)

    At yesterday's afternoon coffee break, we gave the following additional hint:

    To plot proper motions for catalogue objects in Aladin, try the Create a filter… entry in the Catalog menu.

    And this morning, we added:

    If you found Gaia DR3, you can also find editions of the NGC catalog (shameless plug: openngc). These are small enough for a plain SELECT * FROM….

    Followup (2023-09-14)

    The last puzzler hint is:

    Aladin's dist tool comes in handy when you want to do quick measurements on the sky. If you are in Berlin, you still have until 16:00 today to hand in your solution.

    However, the puzzler should not prevent you from attending our splinter meeting on e-science and the Virtual Observatory, where I will give an overview over the state of ADQLs in arrays. Regular readers of this blog will remember my previous treatment of the topic, but this time the queries will be about time series.

    Followup (2023-09-14)

    Well, the prize is drawn. This time, it went to a team from Marburg:

    Two persons holding a large towel with an astronomical image printed on it, in the background a big screen with the Aladin VO client on it.

    As promised, here's our solution using Aladin. But one of the nice things about the VO is that you get to choose your tools. One participant using pyVO was kind enough to let us publish their solution using pyVO, too: puzzler2023-solution.py. Thanks to everyone who particpated!

Page 1 / 5 »