Posts with the Tag Registry:

A New Constraint Class in PyVO's Registry API: UAT

2025-02-14 Markus Demleitner

A scan of a book page: lots of astronomy-relevant topics ranging from "Cronometrie" to "Kosmologie, Relativitätstheorie". Overlaid a title page stating "Astronomischer Jahresbericht. Die Literatur des Jahres 1967".

This was how they did what I am talking about here almost 60 years ago: a page of the table of contents of the “Astronomischer Jahresbericht” for 1967, the last volume before it was turned into the English-language Astronomy and Astrophysics Abstracts, which were the main tool for literature work in astronomy until the ADS came along in the late 1990ies.

Thesauri and the UAT
Why Keywords?
The UAT constraint
Implementation

I have recently created a pull request against pyVO to furnish the library with a new constraint to search for data and services: Search by a concept drawn from the Unified Astronomy Thesaurus UAT. This is not entirely different from the classical search by subject keywords that was what everyone did before we had the ADS, which is what I am trying to illustrate above. But it has some twists that, I would argue, still make it valuable even in the age of full-text indexes.

To make my argument, let me first set the stage.

Thesauri and the UAT

(Disclaimer: I am currently a member of the UAT steering committee and therefore cannot claim neutrality. However, I would not claim neutrality otherwise, either: the UAT is not perfect, but it's already great)

Librarians (and I am one at heart) love thesauri. Or taxonomies. Or perhaps even ontologies. What may sound like things out of a Harry Potter novel are actually ways to organise a part of the world (a “domain”) into “concepts”. If you are suitably minded, you can think of a “concept“ as a subset of the domain; “suitably minded“ here means that you consider the world as a large set of things and a domain a subset of this world. The IVOA Vocabularies specification contains some additional philosophical background on this way of thinking in sect. 5.2.4.

On this other hand, if you are not suitably minded, a “concept” is not much different from a topic.

There are differences in how each of thesaurus, taxonomy, and ontology does that organising (and people don't always agree on the differences). Ontologies, for instance, let you link concepts in every way, as in “a (bicycle) (is steered) (using) a (handle bar) (made of) ((steel) or (aluminum))“; every parenthesised phrase would be a node (which is a better term in ontologies than “concept”) in a suitably general ontology, and connecting these nodes creates a fine-graned representation of knowledge about the world.

That is potentially extremely powerful, but also almost too hard for humans. Check out WordNet for how far one can take ontologies if very many very smart people spend very many years.

Thesauri, on the other hand, are not as powerful, but they are simpler and within reach for mere humans: there, concepts are simply organised into something like a tree, perhaps (and that is what many people would call a taxonomy) using is-a relationships: A human is a primate is a mammal is a vertebrate is an animal. The UAT actually is using somewhat vaguer notions called “narrower” and “wider”. This lets you state useful if somewhat loose relationships like “asteroid-rotation is narrower than asteroid-dynamics”. For experts: The UAT is using a formalism called SKOS; but don't worry if you can't seem to care.

The UAT is standing on the shoulders of giants: Before it, there has been the IAU thesaurus in 1993, and an astronomy thesaurus was also produced under the auspices of the IVOA. And then there were (and to some extent still are) the numerous keyword schemes designed by journal publishers that would also count as some sort of taxonomy or astronomy.

“Numerous” is not good when people have to assign keywords to their journal articles: If A&A use something drastically or only subtly different from ApJ, and MNRAS still something else, people submitting to multiple journals will quite likely lose their patience and diligence with the keywords. For reasons I will discuss in a second, that is a shame.

Therefore, at least the big American journals have now all switched to using UAT keywords, and I sincerely hope that their international counterparts will follow their example where that has not already happened.

Why Keywords?

Of course, you can argue that when you can do full-text searches, why would you even bother with controlled keyword lists? Against that, I would first argue that it is extremely useful to have a clear idea of what a thing is called: For example, is it delta Cephei stars, Cepheids, δ Cep stars or still something else? Full text search would need to be rather smart to be able to sort out terminological turmoil of this kind for you.

And then you would still not know if W Virginis stars (or should you say “Type II Cepheids”? You see how useful proper terminology is) are included in whatever your author called Cepheids (or whatever they called it). Defining concepts as precisely as possible thus is already great.

The keyword system becomes even more useful when the hiearchy we see in the Cepheid example becomes visible to computers. If a computer knows that there is some relationship between W Virgins stars and classical Cepheids, it can, for instance, expand or refine your queries (“give me data for all kinds of Cepheids”) as necessary. To give you an idea of how this looks in practice, here is how SemBaReBro displays the Cepheid area in the UAT:

Arrows between texts like "Type II Cepheid variable stars", "Cepheid variable stars", and "Young disk Cepheid variable stars"

In that image, only concepts associated with resources in the Registry have a spiffy IVOA logo; that so few VO resources claim to deal with Cepheids tells you that our data providers can probably improve their annotations quite a bit. But that is for another day; the hope is that as more people search using UAT concepts, the data providers will see a larger benefit in choosing them wisely[1].

By the way, if you are a regular around here, you will have seen images like that before; I have talked about Sembarebro in 2021 already, and that post contains more reasons for having and maintaining vocabularies.

Oh, and for the definitions of the concepts, you can (in general; in the UAT, there are still a few concepts without definitions) dereference the concept URI, which in the VO is always of the form <vocabulary uri>#<term identifier>, where the vocabulary URI starts with http://www.ivoa.net/rdf, after which there is the vocabulary name.

Thus, if you point your web browser to https://www.ivoa.net/rdf/uat#cepheid-variable-stars [2], you will learn that a Cepheid is:

A class of luminous, yellow supergiants that are pulsating variables and whose period of variation is a function of their luminosity. These stars expand and contract at extremely regular periods, in the range 1-50 days [...]

The UAT constraint

Remember? This was supposed to be a blog post about a new search constraint in pyVO. Well, after all the preliminaries I can finally reveal that once pyVO PR #649 is merged, you can search by UAT concepts:

>>> from pyvo import registry
>>> print(registry.search(registry.UAT("variable-stars")))
<DALResultsTable length=2010>
              ivoid               ...
                                  ...
              object              ...
--------------------------------- ...
         ivo://cds.vizier/b/corot ...
          ivo://cds.vizier/b/gcvs ...
           ivo://cds.vizier/b/vsx ...
          ivo://cds.vizier/i/280b ...
           ivo://cds.vizier/i/345 ...
           ivo://cds.vizier/i/350 ...
                              ... ...
            ivo://cds.vizier/v/97 ...
         ivo://cds.vizier/vii/293 ...
   ivo://org.gavo.dc/apass/q/cone ...
ivo://org.gavo.dc/bgds/l/meanphot ...
     ivo://org.gavo.dc/bgds/l/ssa ...
     ivo://org.gavo.dc/bgds/q/sia ...

In case you have never used pyVO's Registry API before, you may want to skim my post on that topic before continuing.

Since the default keyword search also queries RegTAP's res_subject table (which is what this constraint is based on), this is perhaps not too exciting. At least there is a built-in protection against typos:

>>> print(registry.search(registry.UAT("varialbe-stars")))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/msdemlei/gavo/src/pyvo/pyvo/registry/rtcons.py", line 713, in __init__
    raise dalq.DALQueryError(
pyvo.dal.exceptions.DALQueryError: varialbe-stars does not identify an IVOA uat concept (see http://www.ivoa.net/rdf/uat).

It becomes more exciting when you start exploiting the intrinsic hierarchy; the constraint constructor supports optional keyword arguments expand_up and expand_down, giving the number of levels of parent and child concepts to include. For instance, to discover resources talking about any sort of supernova, you would say:

>>> print(registry.search(registry.UAT("supernovae", expand_down=10)))
<DALResultsTable length=593>
                 ivoid                   ...
                                         ...
                 object                  ...
---------------------------------------- ...
                   ivo://cds.vizier/b/sn ...
                 ivo://cds.vizier/ii/159 ...
                 ivo://cds.vizier/ii/189 ...
                 ivo://cds.vizier/ii/205 ...
                ivo://cds.vizier/ii/214a ...
                 ivo://cds.vizier/ii/218 ...
                                     ... ...
           ivo://cds.vizier/j/pasp/122/1 ...
       ivo://cds.vizier/j/pasp/131/a4002 ...
           ivo://cds.vizier/j/pazh/30/37 ...
          ivo://cds.vizier/j/pazh/37/837 ...
ivo://edu.gavo.org/eurovo/aida_snconfirm ...
                ivo://mast.stsci/candels ...

There is no overwhelming magic in this, as you can see when you tell pyVO to show you the query it actually runs:

>>> print(registry.get_RegTAP_query(registry.UAT("supernovae", expand_down=10)))
SELECT
  [crazy stuff elided]
WHERE
(ivoid IN (SELECT DISTINCT ivoid FROM rr.res_subject WHERE res_subject in (
  'core-collapse-supernovae', 'hypernovae', 'supernovae',
  'type-ia-supernovae', 'type-ib-supernovae', 'type-ic-supernovae',
  'type-ii-supernovae')))
GROUP BY [whatever]

Incidentally, some services have an ADQL extension (a “user defined function“ or UDF) that lets you do these kinds of things on the server side; that is particularly nice when you do not have the power of Python at your fingertips, as for instance interactively in TOPCAT. This UDF is:

gavo_vocmatch(vocname STRING, term STRING, matchagainst STRING) -> INTEGER

(documentation at the GAVO data centre). There are technical differences, some of which I try to explain in amoment. But if you run something like:

SELECT ivoid FROM rr.res_subject
WHERE 1=gavo_vocmatch('uat', 'supernovae', res_subject)

on the TAP service at http://dc.g-vo.org/tap, you will get what you would get with registry.UAT("supernovae", expand_down=1). That UDF also works with other vocabularies. I particularly like the combination of product-type, obscore, and gavo_vocmatch.

If you wonder why gavo_vocmatch does not go on expanding towards narrower concepts as far as it can go: That is because what pyVO does is semantically somewhat questionable.

You see, SKOS' notions of what is wider and narrower are not transitive. This means that just because A is wider than B and B is wider than C it is not certain that A is wider than C. In the UAT, this sometimes leads to odd results when you follow a branch of concepts toward narrower concepts, mostly because narrower sometimes means part-of (“Meronymy”) and sometimes is-a (“Hyponymy“). Here is an example discovered by my colleague Adrian Lucy:

interstellar-medium wider nebulae wider emission-nebulae wider planetary-nebulae wider planetary-nebulae-nuclei

Certainly, nobody would argue that that the central stars of planetary nebulae somehow are a sort of or are part of the interstellar medium, although each individual relationship in that chain makes sense as such.

Since SKOS relationships are not transitive, gavo_vocmatch, being a general tool, has to stop at one level of expansion. By the way, it will not do that for the other flavours of IVOA vocabularies, which have other (transitive) notions of narrower-ness. With the UAT constraint, I have fewer scruples, in particular since the expansion depth is under user control.

Implementation

Talking about technicalities, let me use this opportunity to invite you to contribute your own Registry constraints to pyVO. They are not particularly hard to write if you know both ADQL and Python. You will find several examples – between trivial and service-sensing complex in pyvo.registry.rtcons. The code for UAT looks like this (documentation removed for clarity[3]):

class UAT(SubqueriedConstraint):
    _keyword = "uat"
    _subquery_table = "rr.res_subject"
    _condition = "res_subject in {query_terms}"
    _uat = None

    @classmethod
    def _expand(cls, term, level, direction):
        result = {term}
        new_concepts = cls._uat[term][direction]
        if level:
            for concept in new_concepts:
                result |= cls._expand(concept, level-1, direction)
        return result

    def __init__(self, uat_keyword, *, expand_up=0, expand_down=0):
        if self.__class__._uat is None:
            self.__class__._uat = vocabularies.get_vocabulary("uat")["terms"]

        if uat_keyword not in self._uat:
            raise dalq.DALQueryError(
                f"{uat_keyword} does not identify an IVOA uat"
                " concept (see http://www.ivoa.net/rdf/uat).")

        query_terms = {uat_keyword}
        if expand_up:
            query_terms |= self._expand(uat_keyword, expand_up, "wider")
        if expand_down:
            query_terms |= self._expand(uat_keyword, expand_down, "narrower")

        self._fillers = {"query_terms": query_terms}

Let me briefly describe what is going on here. First, we inherit from the base class SubqueriedConstraint. This is a class that takes care that your constraints are nicely encapsulated in a subquery, which generally is what you want in pyVO. Calmly adding natural joins as recommended by the RegTAP specification is a dangerous thing for pyVO because as soon as a resource matches your constraint more than once (think “columns with a given UCD”), the RegistryResult lists in pyVO will turn funny.

To make a concrete SubqueriedConstraint, you have to fill out:

the table it will operate on, which is in the _subquery_table class attribute;
an expression suitable for a WHERE clause in the _condition attribute, which is a template for str.format. This is often computed in the constructor, but here it is just a constant expression and thus works fine as a class attribute;
a mapping _fillers mapping the substitutions in the _condition string template to Python values. PyVO's RegTAP machinery will worry about making SQL literals out of these, so feel free to just dump Python values in there. See the make_SQL_literal for what kinds of types it understands and expand it as necessary.

There is an extra class attribute called _keyword. This is used by the pyvo.regtap machinery to let users say, for instance, registry.search(uat="foo.bar") instead of registry.search(registry.UAT("foo.bar")). This is a fairly popular shortcut when your constraints can be expressed as simple strings, but in the case of the UAT constraint you would be missing out on all the interesting functionality (viz., the query expansion that is only available through optional arguments to its constructor).

This particular class has some extra logic. For one, we cache a copy of the UAT terms on first use at the class level. That is not critical for performance because caching already happens at the level of get_vocabulary; but it is convenient when we want query expansion in a class method, which in turn to me feels right because the expansion does not depend on the instance. If you don't grok the __class__ magic, don't worry. It's a nerd thing.

More interesting is what happens in the _expand class method. This takes the term to expand, the number of levels to go, and whether to go up or down in the concept trees (which are of the computer science sort, i.e., with the root at the top) in the direction argument, which can be wider or narrower, following the names of properties in Desise, the format we get our vocabulary in. To learn more about Desise, see section 3.2 of Vocabularies in the VO 2.

At each level, the method now collects the wider or narrower terms, and if there are still levels to include, calls itself on each new term, just with level reduced by one. I consider this a particularly natural application of recursion. Finally. everything coming back is merged into a set, which then is the return value.

And that's really it. Come on: write your own RegTAP constraints, and also have fun with vocabularies. As you see here, it's really not that magic.

[1]	Also, just so you don't leave with the impression I don't believe in AI tech at all, something like SciX's KAILAS might also help improving Registry subject keywords.

[2]

Yes, in a little sleight of hand, I've switched the URI scheme to https here. That's not really right, because the term URIs are supposed to be opaque, but some browsers currently forget the fragment identifiers when the IVOA web server redirects them to https, and so https is safer for this demonstration. This is a good example of why the web would be a better place if http had been evolved to support transparent, client-controlled encryption (rather than inventing https).

[3]	I've always wanted to write this.

Global Dataset Discovery in PyVO

2024-02-23 Markus Demleitner

A Tkinter user interface with inputs for Space, Spectrum, and Time, a checkbox marked "inclusive", and buttons Run, Stop, Broadcast, Save, and Quit.

Admittedly somewhat old-style: As part of teaching global dataset discovery to pyVO, I have also come up with a Tkinter GUI for it. See A UI for more on this.

One of the more exciting promises of the Virtual Observatory was global dataset discovery: You say “Give me all spectra of object X that there are“, and the computer relates that request to all the services that might have applicable data. Once the results come in, they are merged into some uniformly browsable form.

In the early VO, there were a few applications that let you do this; I fondly remember VODesktop. As the VO grew and diversified, however, this became harder and harder, partly because there were more and more services, partly because there were more protocols through which to publish data. Thus, for all I can see, there is, at this point, no software that can actually query all services plausibly serving, say, images or spectra in the VO.

I have to say that writing such a thing is not for the faint-hearted, either. I probably wouldn't have tackled it myself unless the pyVO maintainers had made it an effective precondition for cleaning up the pyVO Servicetype constraint.

But they did, and hence as a model I finally wrote some code to do all-VO image searches using all of SIA1, SIA2, and obscore, i.e., the two major versions of the Simple Image Access Protocol plus Obscore tables published through TAP services. I actually have already reported in Tucson on some preparatory work I did last summer and named a few problems:

There are too many services to query on a regular basis, but filtering them would require them to declare their coverage; far too many still don't.
With the current way of registering obscore tables, there is no way to know their coverage.
One dataset may be availble through up to three protocols on a single host.
SIA1 does not even let you constrain time and spectrum.

Some of these problems I can work around, others I can try to fix. Read on to find out how I fared so far.

The pyVO API

Currently, the development happens in pyVO PR #470. While it is still a PR, let me point you to temporary pyVO docs on the proposed pyvo.discover module – of course, all of this is for review and probably not in the shape it will remain in[1].

Followup (2024-11-28)

With the recent release of pyVO 1.6, what is described here is actually available in the release (or by checking out the main branch of the repository).

To quote from there, the basic usage would be something like:

from pyvo import discover
from astropy import units as u
from astropy import time

datasets, log = discover.images_globally(
  space=(339.49, 3.1, 0.1),
  spectrum=650*u.nm,
  time=(time.Time('1995-01-01'), time.Time('1995-12-31')))

At this point, only a cone is supported as a space constraint, and only a single point in spectrum. It would certainly be desirable to be more flexible with the space constraint, but given the capabilities of the various protocols, that is hard to do. Actually, even with the plain cone Obscore (i.e., ironically, the most powerful of the discovery protocols covered here) currently results in an implementation that makes me unhappy: ugly, slow, and wrong. This is requires a longer discussion; see Appendix: Optionality Considered Harmful.

datasets at this point is a list of, conceputally, Obscore records. Technically, the list contains instances of a custom class ImageFound, which have attributes named after the Obscore columns. In case you have doubts about the Semantics of any column, the Obscore specification is there to help. And yes, you can argue we should create a single astropy table from that list. You are probably right.

PyVO adds an extra column over the mandatory obscore set, origin_service. This contains the IVOA identifier (IVOID) of the service at which the dataset was found. You have probably seen IVOIDs before: they are URIs with a scheme of ivo:. What you may not know: these things actually resolve, specifically to registry resource records. You can do this resolution in a web browser: Just prepend https://dc.g-vo.org/I/ to an IVOID and paste the result into the address bar. For instance, my Obscore table has the IVOID ivo://org.gavo.dc/__system__/obscore/obscore; the link below the IVOID leads you to an information page, which happens to be the resource's Registry record formatted with a bit of XSLT. A somewhat more readable but less informative rendering is available when you prepend https://dc.g-vo.org/LP/ (“landing page”).

The second value returned from discover.images_globally is a list of strings with information on how the global discovery progressed. For now, this is not intended to be machine-readable. Humans can figure out which resources were skipped because other services already cover their data, which services yielded how many records, and which services failed, for instance:

Skipping ivo://org.gavo.dc/lswscans/res/positions/siap because it is served by ivo://org.gavo.dc/__system__/obscore/obscore
Skipping ivo://org.gavo.dc/rosat/q/im because it is served by ivo://org.gavo.dc/__system__/obscore/obscore
Obscore GAVO Data Center Obscore Table: 2 records
SIA2 The VO @ ASTRON SIAP Version 2 Service: 0 records
SIA2 ivo://au.csiro/casda/sia2 skipped: ReadTimeout: HTTPSConnectionPool(host=&apos;casda.csiro.au&apos;, port=443): Read timed out. (read timeout=20)
SIA2 CADC Image Search (SIA): 0 records
SIA2 European HST Archive SIAP service: 0 records
...

(On the skipping, see Relationships below). I consider this crucial provenance, as that lets you assess later what you may have missed. When you save the results, be sure to save these, too.

A feature that will presumably (see Inclusivity for the reasons for this expectation) be important at least for a few years is that you can pass the result of a Registry query, and pyVO will try to find services suitable for image discovery on that set of resources.

A relatively straightforward use case for that is global obscore discovery. This would look like this:

from pyvo import discover
from pyvo import registry
from astropy import units as u
from astropy import time

def say(discoverer, s):
        print(s)

datasets, log = discover.images_globally(
  space=(274.6880, -13.7920, 1),
  time=(time.Time('1995-01-01'), time.Time('1995-12-31')),
  services=registry.search(registry.Datamodel("obscore")),
  watcher=say)

The watcher thing lets you, well, watch the progress of the discovery; it receives an instance of the discoverer -- this is so you can abort a discoverer's activities from within some UI -- and the human-readable string to display or process in some other way.

A UI

To get an idea whether this API might one day work for the average astronomer, I have written a Tkinter-based GUI to global image discovery as it is now: tkdiscover (only available from github at this point). This is what a session with it might look like:

Lots of TOPCAT windows with various graphs and tables, an x-ray image of the sky with overplotted points, and a play gray window offering the specification of space, spectrum, and time constraints.

The actual UI is in the top right: A plain window in which you can configure a global discovery query by straightfoward serialisations of discover.images_globally's arguments:

Space (currently, a cone in RA, Dec, and search radius, separated by whitespace of commas)
Spectrum (currently, a single point as a wavelength in metres)
Time (currently, either a single point in time – which probably is rarely useful – or an interval, to be entered as civil DALI dates
Inclusivity.

When you run this, this basically calls discover.images_globally and lets you know how it is progressing. You can click Broadcast (which sends the current result to all VOTable clients on the SAMP bus) or Save at any time and inspect how discovery is progressing. I predict you will want to do that, because querying dozens of services will take time.

There is also a Stop button that aborts the dataset search (you will still have the records already found). Note that the Stop button will not interrupt running network operations, because the network library underneath pyVO, requests, is not designed for being interrupted. Hence, be patient when you hit stop; this may take as long as the configured timeout (currently is 20 seconds) if the service hangs or has to do a lot of work. You can see that tkdiscover has noticed your stop request because the service counter will show a leading zero.

Service counter? Oh, that's what is at the bottom right of the window. Once service discovery is done, that contains three numbers: The number of services to query, the number of services queried already, and the number of services that failed.

The table contains the obscore records described above, and the log lines are in the discovery_log INFO. I will give you that this is extremely unreadable in particular in TOPCAT, which normalises the line separators to plain whitespace. Perhaps some other representation of these log lines would be preferable: A PARAM with a char[][] (but VOTable still is terrible with arrays of variable-length strings)? Or a separate table with char[*] entries?

Inclusivity

I have promised above I'd explain the “Inclusive” part in both the pyVO API and the Tk UI. Well, this is a bit of a sad story.

All-VO-queries take time. Thus, in pyVO we try to only query services that we expect serve data of interest. How do we arrive at expectations like that? Well, quite a few records in the Registry by now declare their coverage in space and time (cf. my 2018 post for details).

The trouble is: Most still don't. The checkmark at inclusive decides whether or not to query these “undecidable” services. Which makes a huge difference in runtime and effort. With the pre-configured constraints in the current prototype (X-Ray images a degree around 274.6880, -13.7920 from the year 1995), we currently discover three services (of which only one actually needs to be queried) when inclusive is off. When it is on, pyVO will query a whopping 323 services (today).

The inclusivity crisis is particularly bad with Obscore tables because of their broken registration pattern; I can say that so bluntly because I am the author of the standard at fault, TAPRegExt. I am preparing a note with a longer explanation and proposals for fixing matters – <cough> follow me on github –, but in all brevity: Obscore data is discovered using something like a flag on TAP services. That is bad because the TAP services usually have entriely different metadata from their Obscore table; think, in particular, of the physical coverage that is relevant here.

It will be quite a bit of effort to get the data providers to do the Registry work required to improve this situation. Until that is done, you will miss Obscore tables when you don't check inclusive (or override automatic resource selection as above) – and if you do check inclusive, your discovery runs will take something like a quarter of an hour.

Relationships

In general, the sheer number of services to query is the Achilles' heel in the whole plan. There is nothing wrong with having a machine query 20 services, but querying 200 is starting to become an effort.

With multi-data collection services like Obscore (or collective SIA2 services), getting down to a few dozen services globally for a well-constrained search is actually not unrealistic; once all resources properly declare their coverage, it is not very likely that more than 20 institutions worldwide will have data in a credibly small region of space, time, and spectrum. If all these run collective services and properly declare the datasets to be served by them, that's our 20-services global query right there.

However, pyVO has to know when data contained in a resource is actually queriable by a collective service. Fortunately, this problem has already been addressed in the 2019 endorsed note on Discovering Data Collections Within Services: Basically, the individual resource declares an IsServedBy relationship to the collective service. PyVO global discovery already looks at these. That is how it could figure out these two things in the sample log given above:

Skipping ivo://org.gavo.dc/lswscans/res/positions/siap because it is served by ivo://org.gavo.dc/__system__/obscore/obscore
Skipping ivo://org.gavo.dc/rosat/q/im because it is served by ivo://org.gavo.dc/__system__/obscore/obscore

But of course the individual services have to declare these relationships. Surprisingly many already do, as you can observe yourself when you run:

select ivoid, related_id from
rr.relationship
natural join rr.capability
where
standard_id like 'ivo://ivoa.net/std/sia%'
and relationship_type='isservedby'

on your favourite RegTAP endpoint (if you have no preferences, use mine: http://dc.g-vo.org/tap). If you have collective services and run individual SIA services, too, please run that query, see if you are in there, and if not, please declare the necessary relationships. In case you are unsure as to what to do, feel free to contact me.

Future Directions

At this point, this is a rather rough prototype that needs a lot of fleshing out. I am posting this in part to invite the more adventurous to try (and break) global discovery and develop further ideas.

Some extensions I am already envisaging include:

Write a similar module for spectra based on SSAP and Obscore. That would then probably also work for time series and similar 1D data.
Do all the Registry work I was just talking about.
Allow interval-valued spectral constraints. That's pretty straightforward; if you are looking for some place to contribute code, this is what I'd point you to.
Track overflow conditions. That should also be simple, probably just a matter of perusing the pyVO docs or source code and then conditionally produce a log entry.
Make an obscore s_region out of the SIA1 WCS information. This should also be easy – perhaps someone already has code for that that's tested around the poles and across the stitching line? Contributions are welcome.
Allow more complex geometries to define the spatial region of interest. To keep SIA1 viable in that scenario it would be conceivable to compute a bounding box for SIA1 POS/SIZE and do “exact” matching locally on the coarser SIA1 result.
Enable multi-position or multi-interval constraints. This pretty certainly would exclude SIA1, and, realistically, I'd probably only enable Obscore services with TAP uploads with this. With those constraints, it would be rather straightforward.
Add SODA support: It would be cool if my ImageFound had a way to say “retrieve data for my RoI only”. This would use SODA and datalink to do server-side cutouts where available and do the cut-out locally otherwise. If this sounds like rocket science: No, the standards for that are actually in place, and pyVO also has the necessary support code. But still the plumbing is somewhat tricky, partly also because pyVO's datalink API still is a bit clunky.
Going async? Right now, we civilly query one service after the other, waiting for each result before proceeding to the next service. This is rather in line with how pyVO is written so far.

However, on the network side for many years asynchronous programming has been a very successful paradigm – for instance, our DaCHS package has been based on an async framework from the start, and Python itself has growing in-language support for async, too.

Async allows you to you fire off a network request and forget about it until the results come back (yes, it's the principle of async TAP, too). That would let people run many queries in parallel, which in turn would result in dramatically reduced waiting times, while we can rather easily ensure that a single client will not overflow any server. Still, it would be handing a fairly powerful tool into possibly unexperienced hands… Well: for now there is no need to decide on this, as pyVO would need rather substantial upgrades to support async.

Appendix: Optionality Considered Harmful

The trouble with obscore and cones is a good illustration of the traps of attempting to fix problems by adding optional features. I currently translate the cone constraint on Obscore using:

"(distance(s_ra, s_dec, {}, {}) < {}".format(
  self.center[0], self.center[1], self.radius)
+" or 1=intersects(circle({}, {}, {}), s_region))".format(
  self.center[0], self.center[1], self.radius))

which is all of ugly, presumably slow, and wrong.

To appreciate what is going on, you need to know that Obscore has two ways to define the spatial coverage of an observation. You can give its “center” (s_ra, s_dec) and something like a rough radius (s_fov), or you can give some sort of geometry (e.g., a polygon: s_region). When the standard was written, the authors wanted to enable Obscore services even on databases that do not know about spherical geometry, and hence s_region is considered rather optional. In consequence, it is missing in many services. And even the s_ra, s_dec, s_fov combo is not mandatory non-null, so you are perfectly entitled to only give s_region.

That is why there are the two conditions or-ed together (ugly) in the code fragment above. 1=intersects(circle(.), s_region) is the correct part; this is basically how the cone is interpreted in SIA1, too. But because s_region may be NULL even when s_ra and s_dec are given, we also need to do a test based on the center position and the field of view. That rather likely makes things slower, possibly quite a bit.

Even worse, the distance-based condition actually is wrong. What I really ought to take into account is s_fov and then do something like distance(.) < {self.radius}+s_fov, that is, the dataset position need only be closer than the cone radius plus the dataset's FoV (“intersects”). But that would again produce a lot of false negatives because s_fov may be NULL, too, and often is, after which the whole condition would be false.

On top of that, it is virtually impossible that such an expression would be evaluated using an index, and hence with this code in place, we would likely be seqscanning the entire obscore table almost every time – which really hurts when you have about 85 Million records in your Obscore table (as I do).

The standard could immediately have sanitised all this by saying: when you have s_ra and s_dec, you must also give a non-empty s_fov and s_region. This is a classic case for where a MUST would have been necessary to produce something that is usable without jumping through hoops. See my post on Requirements and Validators on this blog for a longer exposition on this whole matter.

I'm not sure if there is a better solution than the current “if the operators didn't bother with s_region, the dataset's FoV will be ignored“. If you have good ideas, by all means let me know.

Followup (2024-11-28)

I've given a talk at the Malta interop giving another view on this matter: VO at the limit.

[1]	If you want to try this (in particular without clobbering your “normal” pyVO), do something like this: virtualenv --system-site-packages global-datasets . global-datasets/bin/activate cd global-datasets git clone https://github.com/msdemlei/pyvo cd pyvo git checkout global-datasets pip install .

News From the VO Via ActivityPub

2024-01-22 Markus Demleitner

If you ask us: Get a proper client to join the Fediverse. But as shown here, in a pinch a web browser will do, too.

When Twitter was still fairly young, we had an account there that would tweet out when new data collections appeared in the VO. Even back then, I was rather doubtful whether using a proprietary platform to disseminate open data is a good idea, but as long as the content was also available through standard protocols (RSS in this case), I thought it might be worth a try. Well: It never really took off, and after Twitter broke the whole thing a couple of times by incompatible API changes, I finally let it go ca. 2017.

Given to the recent mass exodus from the smouldering remains of Twitter into the open and standard Fediverse, I thought reviving our little missives there might actually be a worthwhile effort. Specifically, joining Mastodon – which speaks the ActivityPub protocol and hence is part of the Fediverse – has become really straightforward.

So, if the VO Fresh RSS Feed is not for you (perhaps because you do not have an RSS aggregator, which would be a shame), maybe following our new Mastodon account @gavo@botsin.space would be for you?

Followup (2024-11-21)

In late 2024, botsin.space shut down, and we moved our operations to @gavo@astrodon.social; so, please point your fediverse clients there.

Followup (2025-03-13)

While we would obviously nudge people to properly open and federated systems like the mastodon or the Fediverse in general, you can follow us from bluesky, too. Try @gavo.astrodon.social.ap.brid.gy there.

Oh, and yes, I give you the previews the Mastodon web client produces for VizieR resources are not overly pretty yet (curse Javascript templating!), but then if I were you, I'd disable URL previews anyway; really, they are little more than an annoyance.

This post also doubles as identity verification, so:
Visit Our Mastodon Page.
Category: Operations

Registry: A Janitor Speaks Out

2023-04-25 Markus Demleitner

Missing Coverage
Broken Author Names
Fragile Contact Info
Non-machine-readable Subjects
Unfulfilling Resource Descriptions
Lame Relationships
Missing Tablesets
Deficient Column Descriptions
Column UCDs: Missing, Outdated, or Useless
Bad Units

I sometimes claim the reason I like working on the VO Registry is that I am a librarian at heart. Perhaps there is some truth to that, in that ugly metadata does make me unhappy – but beyond that, it also makes the Virtual Observatory look or even work a good deal worse than it should.

Given that, in this post I'm afraid I will sound more like a grumpy janitor than a wise librarian, but let me still attempt to contribute to better metadata by pointing out a few things to watch out for when writing a resource record. People consuming resource records (i.e., VO-using astronomers) are welcome here, too: when you encounter antipatterns mentioned here, a polite complaint to the service publisher is entirely a good thing.

Note that I am using real metadata found in the registry – in case you recognise some of own records, do not feel reprimanded individually. Most of the problems I discuss here are really common at this point, and thus if I picked your metadata, that was mere bad luck. I actually picked some of my own occasionally (but duly fixed the problem then).

Missing Coverage

Since VODataService 1.2, you can say what part of the sky, spectrum, and time your resource covers. That is incredibly useful metadata in practice. Spatial coverage, for instance, is used in Aladin like this:

Screenshot: Resource names in white, orange and green, and a part of the sky (h and χ Persei) next to them

Green means: these services could have data for the patch of sky shown, orange means don't bother with these, and white means: No idea because the resource does not declare its coverage.

Similarly, it would be great if researchers or clients could reliably say:

SELECT * FROM rr.resource JOIN rr.stc_spectral WHERE
  1=ivo_interval_overlaps(spectral_start, spectral_end,
      ivo_specconv(658, 'nm', 'J'), ivo_specconv(654, 'nm', 'J'))

to find resources having data covering the Hα line on the spectral axis. Currently, that's just 2064 resources, and given that Hα sits smack in the middle of the optical window that's an indication that far too few resources say where they are.

So – add STC coverage to your data today. It's not hard with pymoc or pgsphere and chapter 3.2 of VODataService. DaCHS operators will probably get by just studying the corresponding section of the tutorial.

Broken Author Names

On the ADS, last time I had information on that, about 90% of the queries were by author. In the VO registry, by my unscientific estimate, less than 5% of queries constrain authors. Sure, people look for literature and data in different ways and for different purposes, but an important reason for the difference still is that we don't do a good job giving creator/name (which contains the equvialent of the author name).

The ideal format is to have last name first, then a comma, and then abbreviated initials or full first names, as in von der Heide, J.. Many names in the VO are almost in this format do not have a comma; but the comma makes parsing these names a lot simpler, so please put it in. Of all the forms to write names in, that's most easily constrained without guessing how many first names are where. Remember, there are people out their with names like „Kirsten-Claude Selim de Vaucouleurs-van der Heide Lobos“ (or, for that matter, Uthamadhanapuram Venkatasubbaiyer Swaminatha Iyer), and a computer cannot efficiently decide where the last name starts in first name first order (and conversely, without the comma in last name first order, it has a hard time figuring out where the last name stops). Also, last name first almost always gives a more useful natural sort order.

Realistically, people will have to live with J. von der Heide, too, so author searches in the VO will have to look like LIKE '%von der Heide%' for some years to come, but let's at least try to improve. And whatever you do, don't do any of (in approximate order of severity):

Dump in half an acknowledgement, e.g., under a cooperative agreement with the NSF on behalf of the Gemini partnership: the National Science Foundation (United States), or, about as bad: provided by S. Snowden from data by Dickey and Lockman – that's useless for author searches but invites lots of false positives
Dump more than one name into one creator/name element, e.g., Zhuang Z.,Kirby E.N.,Leethochawalit N.,de los Reyes M.A.C. or Voges, W.; Aschenbach, B.; Boller, Th.; (and ~200 more characters) – that's really hard to search and essentially impossible to use for, e.g., author datagraphies
Include affiliations (the VO can't properly deal with those yet), e.g., Zub M. (The PLANET Collaboration) or a combination of this and the previous: Zhu W. (The Spitzer team) Dominik M.
Forget citation debris, e.g., et al. MNRAS (in press), or, shockingly common: and Scheck M.; of course, entire citations (WALKER I. Astron. J. 106) are inappropriate, too – all of this will prevent the use of meaningful name constraints
Give a bibcode: 2014ApJ...787...78M – this likely belongs into content/source
Have empty author name elements (as, at this moment, 13 records)
Cheat with effectively empty author names: <NOT GIVEN>, or "We forgot to give credit, please complain"
Go all uppercase, e.g., ZINNECKER H. – standards-compliant ADQL string comparisons are case-sensitive, and case-folding would require special indexes. Perhaps case-insensitive author matches should be made easier in that van der Waals is probably the same person as Van der Waals, but for now that's not how it works right now. And I don't think that will change any time soon, because if I have learned one thing in my life it is that case insensitivity is almost always evil
Have just a first name: walter or W.I. or W-J
Combine author lists from different contributing papers: Wright et al.; Griffith, Wright, Burke, Ekers; Griffith, Wright – if you really need to do something like this, merge the two author lists – and then of course use one name per creator element

In principle, these considerations would apply to contributors, contacts and perhaps publishers, too, but since I don't think people should use these in discovery queries, their format does not matter too much: If they're human-readable, that's enough.

Fragile Contact Info

Quite regularly I need to ask people to fix something in their publishing registries, and then it's really useful to have reliable contact information. That's also nice for VO users; pyVO, for instance, has the get_contact method on registry records, and in WIRR, you can pop up contact info on all records:

a screenshot showing a match in a registry query. A subwindow is popped up that shows a mail address and a telephone number of a “GAVO Data Center Team“.

For that to work, personal addresses in the contact information are really dangerous – it is my experience that these break significantly more often than institutional addresses. So, please avoid things like (I'm making all of these up because there may still be folks around harvesting mail addresses to send spam):

a.b.miller-parachtnix@gmail.com (well: avoid using gmail.com unconditionally)
friederike.student@ari.uni-heidelberg.de

Rather, create an alias that you can hand on and that perhaps is even a bit speaking. This could be:

vo-help@great-telescope.org
gavo@ari.uni-heidelberg.de
uni-hd-vo@posteo.de (in case your own institution absolutely loathes the idea of addresses not bound to persons)

Non-machine-readable Subjects

VOResource 1.1 said that subjects are to be taken “from the UAT” (that's the Unified Astronomy Thesaurus), but failed to say what exactly that means. Since last July, this is properly defined: Use fragment identifiers into http://www.ivoa.net/rdf/uat, that is, something like abell-clusters.

Having all subject keywords in a predictable format, with useful metadata, and part of a proper hierarchy enables all kinds of cool stuff, and hence it would be great if we could stomp out the following sorts of mispractice in the VO:

Multiple things in one subject element: ATLAS DR1, SIAP, Images – have one term per subject element
Undefined NULL values: NOT PROVIDED, ??? – if you really cannot find a pertinent term, use astronomical-research (or one of the other top-level terms). If nowhere else, that at least helps when your record moves to interdisciplinary search engines
Random free text: optical lines equivalent width catalog – that's multiple terms rolled into one, and the machine will not know what it means
Project or instrument names: 6dF Data Release 3 Spectra, COROT N2 – there's the instrument metadata for some uses of that. For the rest, see above on having projects in creator/name.
Protocol names: TAP – that's what capabilities are for
Service titles: CADC image/cube HiPS service – that's what the title element is for
Non-UAT keyword schemes: Galaxy:general – let's not force VO components to learn about multiple keyword systems. If you are missing something from the UAT, tell them about it

Unfulfilling Resource Descriptions

Descriptions of VO resources serve a dual purpose: The should give researches a quick idea of what to expect and not expect of a resource, and they should mention all the important buzzwords for the benefit of full-text searches. Hence, if you only have two words as in:

Survey (LoLSS).

or have something like a title:

Convolution of normalized synthetic stellar spectra.

or use somewhat uncommon abbreviations and technical details that probably will not help much during data discovery:

USET Group form

(what group? Does „form“ really mean „web browser-facing“? If so, that's again better expressed through the capabilities), you should work a bit on your description.

It is usually helpful to start the description with „this service is…“ or something similar. While it's marginally ok to mention terms and conditions like:

When referencing results from this online catalog, please cite <a href="https://iopscience.iop.org/article/10.384…

further down in the description (the proper place for this kind of thing is the rights element, though), don't discuss stuff like this before you have told people what is in the “online catalog” in the first place. Also: registry records are like e-mail in that you shouldn't use HTML anywhere in registry metadata. If you have to include URLs in text for human consumption, just put them in as text.

Talking about markup: You cannot rely on any of that in descriptions. Even basic ASCII art (or, well, tables) will always come out bad:

Only the data from the first catalog that was matched is reported here according to the following priority list. This means for example, if a star was matched with Hipparcos, that information was used while possible other catalog data are not listed here. -------------------------------------------------------- # stars flg catalog -------------------------------------------------------- 53500 0 no catalog match 55549 1 Hipparcos 254 2 Yale Parallax Catalog 1041 3 Finch and Zacharias 2016 (UPM NNNN-NNNN) 1431 4 MEarth parallaxes 402 5 SIMBAD Database (w/parallax) -------------------------------------------------------- 112177 total number stars in catalog -------------------------------------------------------- Not all parallaxes from the...

(of course, that in this case the newlines and longer sequences of blanks have been normalised to single blanks already in the original resource record makes it particularly certain that the table will come out wrong).

And where in titles abbreviations are usually a good thing, in particular when you can expect your target audience too look for the abbreviation rather spelled-out names in discovery queries, in descriptions you have space, and hence you normally should explain MCQA as „Monte Carlo Quality Assessment“ in something like the following:

Herschel sources in Planck fields measured at 350 µm MCQA

Remember: The people who read your descriptions may come from the future (as in: 25 years from now) or at least may be unfamilar with your project's jargon.

Lame Relationships

There are an incredible 136958 relationships in the current VO that have related-to as their relationship type. This is deplorable because the relevant vocabulary, https://www.ivoa.net/rdf/voresource/relationship_type, marks it as deprecated, and that's for a good reason: Just stating “some relationship“ between two resources is rarely useful. Decide what the relationship is and then pick a proper term (or, if that does not exist, prepare a VEP).

Missing Tablesets

Tablesets are a VODataService feature giving metadata on the return table (or, in the case of the flexible TAP services, the queried tables). They are really useful if you look for services returning some sort of physics – and if you are running TAP services, they will one day let me shut down the GloTS service that replicates a good deal of registry functionality for no good reason at all.

So, if you have a catalog service and your registry record ends somewhat like this:

  </capability>
</ri:Resource>

it is almost certainly missing a tableset (which would normally go after the capabilities; you are probably also missing the sky coverage, though, because that would sit there, too).

Writing basic tablesets is not hard. In fact, if you are running a TAP service, you have a working tableset on your service's tables endpoint. But even without VOSI tables, making a tableset from the VOTable you return is straightforward – with a few encouraging words, I could be talked to write a few lines of Python that do that.

I will readily admit that writing good tablesets is more involved, but what is hard about it you should be doing anyway, because it also will improve the VOTables that you write, and hence the usability of your data all around. So, until the end of this post let me look at some common warts of the column metadata in today's VO.

Deficient Column Descriptions

Column descriptions like ?, ??, or even ??? are surprisingly common. Please don't do that. If you really have no idea what your upstream has put into a column, admit that, aplogise and try to make your upstream explain.

And while RA somewhat works among astronomers, a word or two on the reference system (“IRCS”) and an informal provenance (“from PSF fits”) would certainly be much appreciated by your users and might even come handy in discovery.

Or consider “Age” – this could immediately be improved by revealing just what has aged here and, again, some hint on how the age was estimated (e.g., “obtained from ivo://foo.bar/res” versus “by isochrone fitting”).

But don't overdo it, either: Do not include entire footnotes in descriptions, because that will lead to many false positives in full text searches (not to mention slow down the Registry as a whole if this became common practice). DaCHS operators: you can have footnotes in your RD by using note meta items; cf. Typed Meta Elements in the DaCHS reference.

Near the upper limit of what is appropriate in a column description is perhaps something like this:

The 2.5 percentile of the Log total SFR PDF. This is derived by combining emission line measurements from within the fibre where possible and aperture corrections are done by fitting models ala Gallazzi et al (2005), Salim et al (2007) to the photometry outside the fibre. For those objects where the emission lines within the fibre do not provide an estimate of the SFR, model fits were made to the integrated photometry.

– but at the same time it illustrates how you can provide a lot of information that helps casual users.

The position angles I will turn to in a second give another nice example of why human-readable descriptions are so important: There is no reliable convention of the direction and the baseline of these, so stating something like „north over east“ in a description will avoid a lot of head-scratching.

Column UCDs: Missing, Outdated, or Useless

A very plausible discovery scenario involves UCDs: „give me resources with (some photometry | redshifts | kinematics | dynamics | positions on earth)“. Hence, make sure your columns' metadata has predictable and halfway correct UCDs.

Sure, that's not always straightforward (note, by the way, that there is a reasonably simple process to suggest new UCDs), but there's no excuse for there being 117 columns called pa without any UCD, where pos.posAng will almost certainly fit all of them (though, who knows: 30 of these in addition don't even have a description).

To make sure the UCDs you assign exist, run them through astropy at least once. Do not ignore complaints by astropy; it is actually preferable to have no UCD rather than “??” (which currently a whopping 30342 column sport, in addition to which we have 41 times “???“ and 70 times “????“[1]). Also, resist the temptation to freely invent things, such as the “mjd” UCD I'm seeing on 13 columns. In this particular case, by the way, I give you that saying “this column contains MJDs“ has been a pain in VOTables for a long time, but since version 1.4, TIMESYS lets you do that in a reasonable way.

Oh, let me qualify the “freely invent“ in the last paragraph: It could be[2] that MJD has actually been part of the original UCDs you may still know from cone search (“POS_EQ_RA”); that people have not updated their metadata from these ancient days is also the reason I'm still seeing 13827 columns with an (invalid) UCD of “error“ in column metadata (and 84 with pos_eq_dec).

Unrelatedly (though with an undisputable entertainment value): the longest UCD in the current VO is meta.code;phot.flux.density;arith.ratio;em.ir.15-30um;em.radio.750-1500mhz; unless I and astropy are missing something, it's even syntactically correct.

Bad Units

While I do not see many discovery scenarios that would make good use of units, do not forget to update your units to VOUnits when you touch up your tablesets. This will let software like astropy do the unit calculus for its users, which is a win overall. It cannot do that if you ignore VOUnits and write, say, ABmag/arcsec2 – the AB part you will have to communicate in the description for now, and exponentiation is ** in VOUnits.

Recent versions of the stilts validators (votlint, taplint) will complain about bad units. And you can use stilts interactively to figure out whether you got it right:

$ stilts calc 'vounitStatus("ABmag/arcsec2")'
  BAD_SYNTAX
$ stilts calc 'vounitStatus("mag/arcsec**2")'
  OK

[In a previous version of this post, I have given a piece of astropy to do unit checking; it turns out that astropy by default is rather forgiving, and you want stilts on your box anyway; why not use it for unit validation? If your stilts says something about “bad expression“ with the command lines above, it's an indication that you should update it.]

And with this somewhat non-registry topic: Go forth and polish your resource records. Or, as a consumer of such metadata, ask the publishers of bad resource metadata to fix it.

[1]	Remarkably, there are no ????? or even longer sequences of question marks, and even more remarkably, nobody has put in a lonely question mark. If someone versed in cognitive psychology has a plausible interpretation for that fact: can you share it with me?

[2]	Since the original UCDs predate my VO involvement and, for all I know, never were properly standardised, I frankly can't say.

Query the Registry with WIRR

2021-07-30 Markus Demleitner

Pixels from venerable VODesktop and WIRR: it's supposed to be about the same thing, except WIRR uses and exposes the latest Registry standards (and then some tech that's not standard yet).

When the VO was young, there was a programme called VODesktop that had a very nice interface for searching the Registry. Also, it would run queries against the services discovered, giving nice all-VO querying that few modern clients do quite as elegantly. Regrettably, when the astrogrid UK project was de-funded, VODesktop's development ceased in 2010.

In 2012, it had become clear that nobody would step up to continue it, and I wanted to at least provide a replacement for the Registry interface part. In consequence, Florian Rothmaier and I wrote the Web Interface to the Relational Registry, or WIRR for short; this lets you build Registry queries in your Web Browser in an interface inspired by VODesktop (which, I'm told, in turn was inspired by early iTunes).

WIRR's sweet spot is between the Registry interfaces in the usual clients (TOPCAT, Aladin: these try to hide the gory details of where their service lists come from and hence are limited in what interaction they allow) and using a TAP client to write and execute RegTAP queries (where there are no limitations beyond the protocol's, but it's tedious unless you happen to know the RegTAP standard by heart).

In contrast to its model VODesktop, WIRR cannot run any queries against the services discovered using it. But you can transfer the services you have found to clients via SAMP (TOPCAT can handle the relevant MTypes, but I'm frankly not sure what else). Apart from that, an obvious use for WIRR are the queries one needs in VO curation. For instance, I keep linking to it when sending people canned registry queries, as in the section on claiming an authority in the DaCHS Tutorial.

Given that both Javascript and the Registry have evolved a lot in the past decade, WIRR was in need of a major redecoration for some time now, and in early July, I found some time to do it. The central result is that the code is now halfway modern, strict Javascript; let's see how many web browsers still run that can't execute this.

On the surface, much less has changed, but there are some news I'd consider noteworthy and that might help your data discovery-fu:
- Since I've added some constraint types, the constraint type selector is now a hierarchical box, sporting what I think are or should be the most common constraint types (full text, service type and UAT term) on level 0 and then having “Blind Discovery“, “Finer Grained“, and “Special Effects“ as pop-ups; all this so we obey Miller's Rule of Seven.
- Rather than explain the constraints on a second, separate page, there are now brief help texts coming with each constaint.
- You can now match against UAT concepts, and there is a completing input box for them; in case you're wondering what this is about, see this post from last February. And yes, next time I'll play with WIRR I'll probably include SemBaReBro here.
- When constraining by column UCD, you can now choose from UCDs found in the registry (the “Pick one“ button).
- You can now constrain by spatial, temporal, and spectral coverage, though that's still a gamble because not many (or, actually, very few in the case of temporal and spectral) operators care to declare their services' coverage. When they don't, you won't see their resources with such blind discovery constraints. For some background on this, check Space and Time not lost on the Registry on this blog.
- There is now a „SQL“ button with successful searches that lets you retrieve the SQL executed for the particular constraint. While that query does not immediately execute on RegTAP services (it's Postgres' SQL rather than ADQL), it ought to give you a head start when transplanting your Registry query into, say, a pyVO-based script.
- You can now use your browser's back and forward buttons (or, in my case. key bindings) to navigate in your query history.
What this still doesn't do: Work without Javascript. That's a bit of a disgrace, since after the last changes it would actually be reasonable to provide non-javascript fallbacks for some of the basic functionality (of course, no SAMP at all then…). I'll do it the first time someone asks. Promised.

A document that now needs at least slight updates because things have moved about a bit is the data discovery use case Florian wrote back then. The updates absolutely necessary are not terribly involved, but I would like to use the opportunity to add a bit more spice to the tutorial. If you have ideas: I'm all ears.

Oh, and before I close: you can still run VODesktop; kudos to the maintainers of the JVM for that. But it's nevertheless not really usable any more, which perhaps isn't too surprising for a client built on top of experimental online services ten years ago. For one, its TAP client speaks pre-release versions of both TAP and ADQL, so those won't work on modern TAP services (and the ancient ones have vanished). Worse, it needed to use a non-standard extension of RegTAP's predecessor (for those old enough to remember: it used XQuery), and none of the modern searchable registries understands that any more.

Which is a pity, really. It's been a fine programme. It just was a few years early: By 2012, everything it needed has been defined in nice, stable standards that are still around and probably will be for another decade at least.

Category: Data

Page 1 / 3 »