Posts with the Tag Processes:

Requirements and Validators

2022-03-07 Markus Demleitner

Content Warning: this is mainly VO lore. I am not claiming any immediate applicability to the use or publication of astronomical data.

This morning, I set out to reply to a mail by Mark Taylor and noticed after a while that I was writing a philosophical piece on how to write standards – and how not to – that I may want to refer to again later. So, I'll make this a blog post.

The story started when the excellent stilts taplint during my monthly validation routine produced an error when exercising my data centre's TAP endpoint:

I-OBS-QSUB-5 Submitting query: SELECT TOP 1 obs_id FROM ivoa.ObsCore WHERE obs_id IS NULL
E-OBS-QERR-1 TAP query failed [Service error: "Field query: Query timed out (took too long).

What happened is that stilts tried to ascertain that all rows in my obscore table satisfy the standard's requirement that the obs_id column is non-NULL (see page 20). This made Postgres – the database system actually executing the queries – run what is known as a sequential scan through the tables involved in obscore; the reason underlying this bad judgement is a bit involved and has to do with the fact that in DaCHS, ivoa.obscore is a view composed of many tables. I will spare you the details, but the net effect of that is that it is not easy to tell Postgres that rows with obs_id NULL, if they exist at all, will be few and far between.

By now, the number of data sets in my obscore table approaches 100'000'000, and fetching all that data simply takes time, more time than a synchronous query has on my site[1].

Granted, I could fix that by adding indexes on the columns involved, but since these come from several dozen tables, that would be quite a bit of work for both me and the computer. Is that work worth it? Well, it certainly is if otherwise I'm breaking the standard, but since it is a serious amount of work, I am tempted to wonder: does the requirement actually make sense? And this leads to the question:

Why do we require things in standards?

In the end, there is just one reason to require something in a standard: Without the requirement, something important breaks. When one thinks about this a bit more deeply, one can distinguish two somewhat finer classes of requirements.

(a) “Internal requirements“. These are rules imposed so machines can do their job. The most obvious examples here are requirements on how to write things. For instance, if a client writes an interval as lower/upper and the service expects lower upper, it just won't work. Hence, a standard has to say “The separator in intervals MUST be whitespace” (or whatever).

There are more subtle requirements in that department. For instance, many tables need a primary key because other tables may want to refer to them. For Obscore, this becomes relevant just about now, when we think about having extensions for it. Those would add specific metadata for, say, radio or gamma observations. We will probably create them by adding per-extension tables holding a foreign key into ivoa.obscore. This is nice because then you can write something like:

SELECT ...
FROM ivoa.obscore
JOIN ivoa.obs_visibility
  USING (obs_publisher_did)
WHERE (some visiblity-specific constraint)

– and almost everything just works without further thought or effort: No plethora of columns that are NULL in ivoa.obscore for anything that is not a visibility, and no manual filtering out of non-visibilities either: JOIN does it all nicely for you. Isn't relational algebra great?

But this only is possible if obs_publisher_did (well: it's not certain yet whether that actually will be obscore's designated primary key, but bear with me there) really is non-NULL, and if there are no two rows with the same publisher DID (which are the general criteria to make something a primary key in a relation). Hence, these two constraints are something we simply MUST (pun intended) require.

(b) “Functional requirements”. These are requirements resulting from considerations of the use of the standard. I have just encountered a nice example when working on LineTAP, a future standard on how to access data about spectral lines. An important use case there is that the client displays the lines on top of a spectrum, and it will want to put something next to the lines so the user has at least a first indication just what would cause the line to show up. That it can only do if the service provides it with a plausible label – asking clients to invent a label based on the data it has is likely to produce very unsatifying results, as no machine is smart enough to figure out nice, idiomatic strings like „21 cm HI“ or „Hα“. Hence, we simply have to require that each row in such a LineTAP table has a title (technically: the corresponding column has a non-NULL constraint).

Going back to the obs_id example, it does not seem there is a strong case to invoke either (a) or (b) – since the column explicitly has no uniqueness requirement, it will not work as a primary key, and users will probably only want to use it for “grouped” data, where multiple artefacts belong to one “observation”. For data sets not within such groups, there really is no application for obs_id I can see. Of course, I may be missing something, which is why I asked around on the mailing lists.

If we figure out nothing breaks when we remove the requirement, then we should drop it: Every requirement causes some overhead in implementation and validation. In the present case, the implementation overhead would be all the indexes on the various obs_id columns, which I would not otherwise need. The validation overhead are the extra queries that taplint needs to do. Having overhead for no benefit (in terms of things not breaking) goes against sensible parsimony in what we ask our adopters to do (and I'll officially admit here that we do ask quite a bit already).

… and why do we validate them?

In the mail I have cited above, Mark has kindly offered to just not run the query in the validation suite, and all this philosophy was really intended to lead up to a “thanks, but no thanks”.

That is because, first of all, requirements that are not checked by a machine are requirements that are not met. You see, what we do is hard. Sure, there are harder problems in computing, but globally distributed information systems run by only loosely connected parties are rather non-trivial. People writing code to solve non-trivial problems will get it wrong.

The common way to deal with this fact is to test with one client and call it a day when that client seems to work for whatever was chosen as a test case. To mention a non-VO standard where this implement-to-the-client method failed horribly and continues to fail horribly: ACPI, the part of the firmware that's supposed to make, for instance, suspend-to-RAM something one doesn't have to think about. Vendors usually stop developing their ACPI code when the current version of Windows does not fail horribly with their implementation. A paper in the proceedings of the 2007 Linux symposium discusses some of the consequences in the least offensive way conceivable – and in a way that I, as a VO developer running quite a few Linux boxes, can very much relate to.

The bottom line is that if an unmet requirement breaks things and validators do not check for that requirement, then services will work to some degree with a certain client and break as soon as people switch to a different client (or perhaps only try to be smart). That's in stark contrast to one of my main selling points when I do VO teaching: „Hey, you can prototype with TOPCAT, and when you've figured out things, just switch to pyVO so you can scale, automate, and make your work reproducable“.

So, let's try to avoid unvalidated requirements.

Instead, let's have as few requirements as we can while covering the use cases we envision. And then let's have great validators that make sure these requirements are met by the services (or instance documents, or whatever it may be). Such validators not only help making the VO an effective environment that's fun to work with. They also give service operators – like… me – a peace of mind that nothing else can provide.

[1]

I keep a rather tight limit on the sync queries because the system also answers registry discovery queries, and these should be reasonably snappy. If I let long sync queries run, it is very easy to overload the system by accident. If I don't, people who want to run long queries can move to async. There, jobs are queued and only let in one or two at a time. That will not (usually) overload anything.

Small Change, Big Win

2022-02-23 Markus Demleitner

That's SCS 1.03 Erratum 2 rendered in my browser with a bit of image processing to celebrate that there's one painful VO legacy less on this world.

PSA: what follows is VO lore that may be entertaining but will not help you use or publish astronomical data.

Today, I've made a very small commit to my VO publication package DaCHS (revision 8452):
```
--- gavo/web/vodal.py (revision 8451)
+++ gavo/web/vodal.py (working copy)
@@ -260,7 +260,6 @@
        version = "1.0"
        parameterStyle = "dali"
        standardId = "ivo://ivoa.net/std/ConeSearch"
-     defaultOutputFormat = "votable1.1"
```
One deleted line, small cause, huge effect.

This story starts with the oldest „operational“ VO standard, Simple Cone Search, which was formally published in 2008 but really got its current shape a lot earlier.

I've not been there back then, but I think the authors expected that clients would be parsing the VOTables that the services were returning using something called XML binding. That, well, was a technique where code was generated from an XML schema, and only instance documents conforming to that exact schema could be parsed with that code.

That is of course the opposite of the golden rule of interoperability (“be strict in what you produce and lenient in what you accept”) and thus would have been a terrible implementation choice for interoperable clients (and I believe nobody ever tried it). But somehow – or that is my explanation – the XML binding reasoning translated into the requirement that SCS services could only return VOTable 1.0 or VOTable 1.1, and that made it into the standard. It was hence the law. And that it DaCHS had to keep alive VOTable 1.1 for writing (which the above commit of course doesn't remove, but I can remove it now any time I feel like it). And that it couldn't do a lot of useful things that required features not present in VOTable 1.1.

Nobody dared to touch the problem for about a decade, as it was actually unclear whether some ancient code might still be doing useful work with SCS and XML binding. And I shouldn't be scoulding them after I have recently broken ESO examples under the assumption that “aw, nobody's gonna do this“. Then, starting about five years ago, we had a couple of discussions at various conferences about how we might bring SCS into the present VO (where it, it has to be said, sticks out a bit for several other reasons, too, like its funky error reporting and the funny UCDs it uses). But these weren't easy: What exactly are we allowed to break within a minor version under the above assumption (“aw, nobody… “)? If we do a major version, how do we plan for co-existence for two parallel major version?

Well: For the version restriction, in the end a simple Erratum was enough. On January 26, 2022, the IVOA Technical Coordination Group accepted SCS 1.03 Erratum 2. And now I can return whatever VOTable version suits me. Phewy.

I can now have GROUPs in GROUPs (which I need to annotate photometry), I can finally return tables with my old proposal for STC in VOTable in SCS results (where they would have mattered most – not that anyone cares any more, as that ship has sailed somewhere completely different).

Hey, I can have xtypes. Doesn't mean anything to you? Well, try this: In TOPCAT, open VO/Cone Search. Type “Constellations” and select the “cslt cone“ service. Run a query for some part of the sky, with a size of a few 10s of degrees. Open a sky plot, and in there, do Layers → Add Area Control, and in that control select the table you have just pulled in. Presto: You'll see the constellation boundaries without further configuration, and that's because TOPCAT has the xtype to figure out that the odd numbers it sees are really the vertex coordinates of a spherical polygon in DALI serialisation.

Not a big deal, you say? Perhaps. But lots of small deals accumulated make the difference between what you can do and what you cannot, in particular across services (which is what the VO is about).

Removing the erroneous constraint on VOTable versions in SCS opened the standard up for quite a few small deals. Thanks, TCG!

Category: Standards

Building consensus

2020-05-08 Markus Demleitner

Sometimes, building consensus takes a little bending: Me, at the Shanghai Interop of 2017. In-joke: there's “STC” on the slide.

In the Virtual Observatory, procedures are built on consensus: No (relevant) decisions are passed based some sort of majority vote. While I personally think that's a very good thing in general – you really don't want to clobber minorities, and I couldn't even give a minimal size of such a minority below which it might be ok to ignore them –, there is a profound operational reason for that: We cannot force data centers or software writers to comply with our standards, so they had better agree with them in the first place.

However, building consensus (to avoid Chomsky's somewhat odious notion of manufacturing consent) is hard. In my current work, this insight manifests itself most strongly when I wear my hat as chair of the IVOA Semantics Working Group, where we need to sort items from a certain part of the world into separate boxes and label those, that is, we're building vocabularies. “Part of the world” can be formalised, and there are big phrases like “universe of discourse” to denote such formalisations, but to give you an idea, it's things like reference frames, topics astronomy in general talks about (think journal keywords), relationships between data collections and services, or the roles of files related to or making up a dataset. If you visit the VO's vocabulary repository, you will see what parts we are trying to systematise, and if you skim the current draft for the next release of Vocabularies in the VO, in section two you can find a few reasons why we are bothering to do that.

As you may expect if you have ever tried classifications like this, what boxes (”concepts” in the argot of the semantics folks) there should be and how to label them are questions with plenty of room for dissent. A case study for this is the discussion on VEP-001 and its successors that has been going on since late last year; it also illustrates that we are not talking about bikeshedding here. The discussion clarified much and, in particular, led to substantial improvements not only to the concept in question but also far beyond that. If you are interested, have a look at a few mail threads (here, here, here, or here; more discussion happened live at meetings).

An ideal outcome of such a process is, of course, a solution that is obvious in retrospect, so everyone just agrees. Sometimes, that doesn't happen, and one of these times is VEP-001 and the VEP-003 it evolved into. A spontanous splinter between sessions of this week's Virtual Interop yielded two rather sensible names for the concept we had identified in the previous debates: #sibling on the one hand, and #co-derived on the other (in case you're RDF-minded: the full vocabulary URIs are obtained by prefixing this with the vocabulary URI, http://www.ivoa.net/rdf/datalink/core). Choosing between the two is a bit of a matter of taste, but also of perhaps changing implementations, and so I don't see a clear preference. And the people in the conference didn't reach an agreement before people on the North American west coast really had to have some well-deserved sleep.

In such a situation – extensive discussion yields some very few, apparently rather equivalent solution –, I suspect it is the time to resort to some sort of polling after all. So, in the session I've asked the people involved to give their pain level on a scale of 1 to 10. Given there are quite a few consensus scales out there already (I'm too lazy to look for references now, but I'll retrofit them here if you send some in), I felt this was a bit hasty after I had closed the z**m^H^H^H^H telecon client. But then, thinking about it, I started to like that scale, and so during a little bike ride I came up with what's below. And since I started liking it, I thought I could put it into words, and into a form I can reference when similar situations come up in the future. And so, here it is:

Markus' Pain Level Scale

Oh wow. I'm enthusiastic about it, and I'd get really cross if we didn't do it.
It's great. I don't think we'll find a better solution. People better have really strong reasons to reject it.
Fine. Just go ahead.
Quite reasonable. I have some doubts, but I either don't have a good alternative, or the alternatives certainly won't improve matters.
Reasonable. I can live with it, possibly accepting a very moderate amount of pain (like: change an implementation that I think is fine as it is).
Sigh. I don't like it much. If you think it's useful, do it, but don't blame me if it later turns out it stinks.
Ouch. I wish we didn't have to go there. For instance: This is going to uglify a few things I care about.
Yikes. I think it's a bad idea. Honestly, let's not do it. It's going to make quite a few things a lot uglier, though I give you it might still just barely work.
OMG. What are you thinking? I won't go near it, and I pity everyone who will have to. And it's quite likely going to blow up some things I care about.
Blech. To me, this clearly is a grave mistake that will impact a lot of things very adversely. If I can do anything within reason to stop it, I'll do it. Consider this a veto, and shame on you if you override it.

You can qualify this with:

+:	I've thought long and hard about this, and I think I understand the matter in depth. You'll hence need arguments of the profundity of the Earth's outer core to sway me.
(unqualified):	I've thought about this, and as far as I understand the matter I'm sure about it. More information, solid arguments, or a sudden inspiration while showering might still sway me.
-:	This is a gut feeling. It could very well be phantom pain. Feel free to try a differential diagnosis.

If you like the scale, too, feel free to reference it as href="https://blog.g-vo.org/building-consensus/#scale">https://blog.g-vo.org/building-consensus/#scale.

Page 1 / 1