Artikel mit Tag RegTAP:

  • Query the Registry with WIRR

    Search windows of VODesktop and WIRR

    Pixels from venerable VODesktop and WIRR: it's supposed to be about the same thing, except WIRR uses and exposes the latest Registry standards (and then some tech that's not standard yet).

    When the VO was young, there was a programme called VODesktop that had a very nice interface for searching the Registry. Also, it would run queries against the services discovered, giving nice all-VO querying that few modern clients do quite as elegantly. Regrettably, when the astrogrid UK project was de-funded, VODesktop's development ceased in 2010.

    In 2012, it had become clear that nobody would step up to continue it, and I wanted to at least provide a replacement for the Registry interface part. In consequence, Florian Rothmaier and I wrote the Web Interface to the Relational Registry, or WIRR for short; this lets you build Registry queries in your Web Browser in an interface inspired by VODesktop (which, I'm told, in turn was inspired by early iTunes).

    WIRR's sweet spot is between the Registry interfaces in the usual clients (TOPCAT, Aladin: these try to hide the gory details of where their service lists come from and hence are limited in what interaction they allow) and using a TAP client to write and execute RegTAP queries (where there are no limitations beyond the protocol's, but it's tedious unless you happen to know the RegTAP standard by heart).

    In contrast to its model VODesktop, WIRR cannot run any queries against the services discovered using it. But you can transfer the services you have found to clients via SAMP (TOPCAT can handle the relevant MTypes, but I'm frankly not sure what else). Apart from that, an obvious use for WIRR are the queries one needs in VO curation. For instance, I keep linking to it when sending people canned registry queries, as in the section on claiming an authority in the DaCHS Tutorial.

    Given that both Javascript and the Registry have evolved a lot in the past decade, WIRR was in need of a major redecoration for some time now, and in early July, I found some time to do it. The central result is that the code is now halfway modern, strict Javascript; let's see how many web browsers still run that can't execute this.

    On the surface, much less has changed, but there are some news I'd consider noteworthy and that might help your data discovery-fu:

    • Since I've added some constraint types, the constraint type selector is now a hierarchical box, sporting what I think are or should be the most common constraint types (full text, service type and UAT term) on level 0 and then having “Blind Discovery“, “Finer Grained“, and “Special Effects“ as pop-ups; all this so we obey Miller's Rule of Seven.
    • Rather than explain the constraints on a second, separate page, there are now brief help texts coming with each constaint.
    • You can now match against UAT concepts, and there is a completing input box for them; in case you're wondering what this is about, see this post from last February. And yes, next time I'll play with WIRR I'll probably include SemBaReBro here.
    • When constraining by column UCD, you can now choose from UCDs found in the registry (the “Pick one“ button).
    • You can now constrain by spatial, temporal, and spectral coverage, though that's still a gamble because not many (or, actually, very few in the case of temporal and spectral) operators care to declare their services' coverage. When they don't, you won't see their resources with such blind discovery constraints. For some background on this, check Space and Time not lost on the Registry on this blog.
    • There is now a „SQL“ button with successful searches that lets you retrieve the SQL executed for the particular constraint. While that query does not immediately execute on RegTAP services (it's Postgres' SQL rather than ADQL), it ought to give you a head start when transplanting your Registry query into, say, a pyVO-based script.
    • You can now use your browser's back and forward buttons (or, in my case. key bindings) to navigate in your query history.

    What this still doesn't do: Work without Javascript. That's a bit of a disgrace, since after the last changes it would actually be reasonable to provide non-javascript fallbacks for some of the basic functionality (of course, no SAMP at all then…). I'll do it the first time someone asks. Promised.

    A document that now needs at least slight updates because things have moved about a bit is the data discovery use case Florian wrote back then. The updates absolutely necessary are not terribly involved, but I would like to use the opportunity to add a bit more spice to the tutorial. If you have ideas: I'm all ears.

    Oh, and before I close: you can still run VODesktop; kudos to the maintainers of the JVM for that. But it's nevertheless not really usable any more, which perhaps isn't too surprising for a client built on top of experimental online services ten years ago. For one, its TAP client speaks pre-release versions of both TAP and ADQL, so those won't work on modern TAP services (and the ancient ones have vanished). Worse, it needed to use a non-standard extension of RegTAP's predecessor (for those old enough to remember: it used XQuery), and none of the modern searchable registries understands that any more.

    Which is a pity, really. It's been a fine programme. It just was a few years early: By 2012, everything it needed has been defined in nice, stable standards that are still around and probably will be for another decade at least.

  • Semantics, Cross-Discipline Discovery, and Down-To-Earth Code

    Boxes-and-arrows view of the UAT

    A tiny piece of the Unified Astronomy Thesaurus as viewed by Sembarebro – the IVOA logos sit on terms that have VO resoures on them.

    Sometimes people ask me (in particular when I'm wearing my hat as the current chair of the IVOA Semantics working group) “well, what's this semantics thing good for?“ There are many answers, but here's one that nicely meshes with my pet subject data discovery: You want hierarchical, agreed-upon word lists to bridge discipline gaps.

    This story starts with B2FIND, a cross-disciplinary metadata aggregator for science data run within the framework of the European Open Science Cloud (EOSC). GAVO (or, more precisely, Heidelberg University's Astronomy) is involved in the EOSC via the ESCAPE project, and so I have had the pleasure of interacting with B2FIND for a while now. In particular, they are harvesting the metadata records of the Virtual Observatory Registry from us.

    This of course requires a bit of mapping, because the VO's metadata formats (VOResource, VODataService, and several extensions; see 2014A&C.....7..101D to learn more) are far too fine-grained for the wider scientific public. Not even our good friends from high-energy physics would appreciate being served links to, say, TAP endpoints (yet!). So, on our end we're mapping to the Datacite metadata kernel, which from VOResource is just a piece of XSL away (plus some perhaps debatable conventions).

    But there's more to this mapping, such as vocabularies of subject keywords. You might argue that in the age of rapid full text searches, keywords are dead. I would beg to disagree. For example, with good, hierarchical keyword systems you can, among many other useful things, offer topical browsing of metadata repositories. While it might not quite qualify as “useful” yet, the SemBaReBro registry browser I've hacked together late last year would be an example for such facilities – and might become part of our WIRR Registry searching tool one day.

    On the topic of subject keywords VOResource says that resources in the VO should be using the Unified Astronomy Thesaurus, specifically in its IVOA incarnation (not quite true yet, but true enough by blog standards). While few do, I've done a mapping of existing keywords in the VO to UAT concepts, which is what's behind SemBaReBro. So: most VO resources now have UAT concepts.

    However, these include concepts like AM Canum Venaticorum Stars, which outside of rather specialised circles of astronomers few people will ever have heard about (which, don't get me wrong, I personally regret – they're funky star systems). Hence, B2FIND does not bother with those.

    When we discussed the subject mapping for B2FIND, we thought using the UAT's top-level concepts might be a good start. However, at that point no VO resources at all actually used these, and, indeed, within astronomy that generally wouldn't make a lot of sense, because they are to unspecific to help much within the discipline. I postponed and then forgot about the problem – when the keywords of the resources weren't even from UAT, solving the granularity mismatch just wasn't humanly possible.

    That was the state of affairs until last Tuesday, when I had a mumble session with B2FIND folks and the topic came up again. And now, thanks partly to the new desise format proposed in the current Vocabularies in the VO 2 draft, things fell nicely into place: Hey, I have UAT concepts, and mapping these to the top-level terms isn't hard either any more.

    So, B2FIND gets the toplevel keywords they've been expecting all the time starting today. Yes: This isn't a panacea suddenly solving all the problems of cross-discipline data discovery, not the least because it's harder than one might think to imagine how such a thing would look like in practice. But given the complexities involved I was positively surprised how easy this particular part of the equation was.

    From here on, there's a bit of tech babble I intend to re-use in the RFC of Vocabularies in the VO 2; don't feel bad if you skip it.

    The first step was to make the mapping from UAT terms to the toplevel terms. The interesting part of the source I'm linking to here is:

    def get_roots_for(term, uat_terms):
      roots, seen = set(), set()
    
      def follow(t):
        wider = uat_terms[t]["wider"]
        if not wider:
          if not t in ROOT_TERMS:
            raise Exception(
              f"{t} found as a top-level term")
          roots.add(t)
        else:
          seen.add(t)
          for wider in uat_terms[t]["wider"]:
            follow(wider)
    
      follow(term)
      return roots
    

    There, uat_terms is essentially just a json-decode of what you get from the vocabulary URI if you ask for desise (see the draft spec linked to above for the technicalities). That's really it, and it even defends against cycles in the concept graph (which are legal by SKOS but shouldn't happen in the UAT) and detached terms (i.e., ones that are not rooted in the top-level terms). For what it does, I claim that's remarkably compact code.

    Once I had that, I needed to get the UAT-mapped subject keywords for the records I'm serving to datacite and fiddle the corresponding roots back in. That's technically a bit more involved because I am producing the datacite records on the fly from the XML representation for VOResource records that I keep in the database, and there's a bit of namespace magic involved (full code). Plus, the UAT-mapped keywords are only kept in the database, not in the metadata records.

    Still, the core operation here is relatively straightforward. Consider:

    def addUATToplevels(dataciteTree):
      # dataciteTree is an (lxml) ElementTree for the
      # result of the XSL transformation.  That's all
      # I have, and thus I first have to fiddle out
      # the identifier we are talking about
      ivoid =  dataciteTree.xpath(
          "//d:alternateIdentifier["
          "@alternateIdentifierType='ivoid']",
          namespaces={"d": DATACITE_NS}
        )[0].text.lower()
      # The .lower() is necessary because ivoids
      # unfortunately are case-insensitive, and RegTAP
      # normalises them to lowercase to retain sanity.
    
      # Now pull the UAT-mapped subject keywords from
      # our RegTAP extension (getTableConn is
      # DaCHS-internal API, but there's no magic in
      # there, it's just connection pooling with
      # guarantees against connections  idle in
      # transaction).
      with base.getTableConn() as conn:
        subjects = set(r[0] for r in
          conn.query("SELECT uat_concept"
            " FROM rr.subject_uat"
            " WHERE ivoid=%(ivoid)s", locals()))
    
      # This is the mapping itself: we do
      # roots-subjects to avoid adding
      # root terms that are already in
      # the record itself.  UAT_TOPLEVELS is the result
      # of the root finding discussed above.
      for term in subjects:
        root = UAT_TOPLEVELS[term]
        newRoots |= (root-subjects)
    
      # And finally fiddle in any new root terms found
      # into the datacite tree
      if newRoots:
        subjects = dataciteTree.xpath(
          "//d:subjects",
          namespaces={"d": DATACITE_NS})[0]
        for root in newRoots:
          newSubject = etree.SubElement(subjects,
            f"{{{DATACITE_NS}}}subject")
          newSubject.text = root
    

    Apart from the technicalities I'd again say that's pretty satisfying code.

    And these two pieces of code are really all I had to do to map between the vocabularies of different granularities – which I claim will probably be the norm as metadata flows between disciplines.

    It's great to see the pieces of a fairly comples puzzle fall into place like that.

  • HTTPS in DaCHS

    Browser windows with and without HTTPS.

    Another little aspect of HTTPS support in DaCHS: In the web interface, the webSAMP button must disappear in pages served through HTTPS: it simply wouldn't work.

    (Warning: No astronomy-relevant content at all this time).

    I can't say I'm a big fan of the mighty push towards HTTPS that's going on right now – as I'm arguing in the updated operator's guide it doesn't do people's privacy a lot of good (compared to, say, pushing for browsers to not execute Javascript by default or have DNSSEC widely deployed), but it's a fairly substantial operational liability. With HTTPS, operators have to deal with cryptographic material, regularly update their certificates, restart their services in time and assemble the whole thing correctly (don't get me started about proxying, SNI, and all those horrors). Users, on the other hand, have to keep their CA certificates in order, in particular when they do programmatic VO access, where the browser vendors, their employers and who knows who else doesn't do it for them. Pop quiz: How would you install a new CA certificate on your box? And will your default browser see it?

    But on the other hand, there are some scenarios in which HTTPS makes sense, and I can remotely fantasise that some of those may even be relevant to the VO. And people have been asking for HTTPS in DaCHS a number of times, at times even because their administrations urged them to switch. So, here it is, hopefully. Turning it on is reasonably easy when you use Letsencrypt (which in particular entails having ports 80 and 443); the section on Letencrypt in the operator's guide tells what to do. In particular don't forget the cron job, because without it, things would break after three months (when the initial certificate expires).

    Things get difficult after that. For one, if your box is known under several names (our data center, for instance, can be reached as any of dc.g-vo.org, vo.uni-hd.de, and dc.zah.uni-heidelberg.de; this of course also includes things like www.example.org and example.org), you'll now have to tell DaCHS about it in the new [web]alternateHostnames configuration item; for instance, we have:

    [web]
    serverURL: http://dc.zah.uni-heidelberg.de
    alternateHostnames:dc.g-vo.org, vo.uni-hd.de
    

    in our /etc/gavo.rc.

    And then the Registry has to know you have https. There's actually no convention for that in the VO yet. But since I'd really like to have at least fallback interfaces with plain HTTP, we'll have to come up with something. For now, my plan is to have the alternative protocol (i.e., HTTPS for sites that have an HTTP-serverURL and vice versa) using the brand-new VOResource 1.1 mirrorURLs (in RegTAP 1.1, they are in the mirror_url column rr.interface). To make DaCHS declare the alternate URLs, set [web]registerAlternative to True.

    Another change I've introduced for HTTPS is that the default HTML template for the form renderer (i.e., the one people use who come with a browser) now suppresses the SAMP button if the request came in through HTTPS; that's because WebSAMP doesn't work with HTTPS and probably never will – at least I can't see a way to make it happen without totally wrecking what security guarantees HTTPS gives.

    All this doesn't yet cater for the case when you use a reverse proxy to terminate HTTPS. If you are in that situation, please talk to me so we can figure out a sane way for you explain to DaCHS what to tell the Registry.

    Anyway, if you want to try things out, just switch to the beta repostitory and upgrade. Feedback is highly welcome.

    Oh, and if you're a client developer: Our data center is now reachable through HTTPS (at https://dc.g-vo.org), and we already have pushed the records with mirrorURLs declaring HTTPS support to the RegTAP service at dc.g-vo.org (the others will have to wait a bit longer, as we haven't re-published our registry records yet (it's all experimental, after all).

  • Space and Time not lost on the Registry

    Histogram: observation dates of an image service

    A histogram of times for which the Palomar-Leiden service has images: That's temporal service coverage right there.

    If you are an astronomer and you've ever tried looking for data in the Virtual Observatory Registry, chances are you have wondered “Why can't I enter my position here?” Or perhaps “So, I'm looking for images in [NIII] – where would I go?”

    Both of these are examples for the use of Space-Time Coordinates (STC) in data discovery – yes, spectral coordinates count as STC, too, and I could make an argument for it. But this post is about something else: None of this has worked in the Registry up to now.

    It's time to mend this blatant omission. To take the next steps, after a bit of discussion on some of the IVOA's mailing lists, I have posted an IVOA note proposing exactly those last Thursday. It is, perhaps with a bit of over-confidence, called A Roadmap for Space-Time Discovery in the VO Registry. And I'd much appreciate feedback, in particular if you are a VO user and have ideas on what you'd like to do with such a facility.

    In this post, I'd like to give a very quick run-down on what is in it for (1) VO users, (2) service operators in general, and (3) service operators who happen to run DaCHS.

    First, users. We already are pretty good on spatial coverage (for about 13000 of almost 20000 resources), so it might be worth experimenting with that. For now, the corresponding table is only available on the RegTAP mirror at http://dc.g-vo.org/tap. There, you can try queries like:

    select ivoid from
    rr.table_column
    natural join rr.stc_spatial
    where
      1=contains(gavo_simbadpoint('HDF'), coverage)
      and ucd like 'phot.flux;em.radio%'
    

    to find – in this case – services that have radio fluxes in the area of the Hubble Deep Field. If these lines scare you or you don't know what to do with the stupid ivoids, check the previous post on this blog – it explains a bit more about RegTAP and why you might care.

    Similarly cool things will, hopefully, some day be possible in spectrum and time. For instance, if you were interested in SII fluxes in the crab nebula in the early sixties, you could, some day, write:

    SELECT ivoid FROM
    rr.stc_temporal
    NATURAL JOIN rr.stc_spectral
    NATURAL JOIN rr.stc_spatial
    WHERE
      1=CONTAINS(gavo_simbadpoint('M1'), coverage)
      AND 1=ivo_interval_overlaps(
        6.69e-7, 6.75e-7,
        wavelength_start, wavelength_end)
      AND 1=ivo_interval_overlaps(
        36900, 38800,
        time_start, time_end)
    

    As you can see, the spectral coordiate will, following (admittedly broken) VO convention, be given in meters of vacuum wavelength, and time in MJD. In particular the thing with the wavelength isn't quite settled yet – personally, I'd much rather have energy there. For one, it's independent of the embedding medium, but much more excitingly, it even remains somewhat sensible when you go to non-electromagnetic messengers.

    A pattern I'm trying to establish is the use of the user-defined function ivo_interval_overlaps, also defined in the Note. This is intended to allow robust query patterns in the presence of two intrinsically interval-valued things: The service's coverage and the part of the spectrum you're interested in, say. With the proposed pattern, either of these can degenerate to a single point and things still work. Things only break when both the service and you figure that “Aw, Hα is just 656.3 nm” and one of you omits a digit or adds one.

    But that's academic at this point, because really few resources define their coverage in time and and spectrum. Try it yourself:

    SELECT COUNT(*) FROM (
      SELECT DISTINCT ivoid FROM rr.stc_temporal) AS q
    

    (the subquery with the DISTINCT is necessary because a single resource can have multiple rows for time and spectrum when there's multiple distinct intervals – think observation campaigns). If this gives you more than a few dozen rows when you read this, I strongly suspect it's no longer 2018.

    To improve this situation, the service operators need to provide the information on the coverage in their resource records. Indeed, the registry schemas already have the notion of a coverage, and the Note, in its core, simply proposes to add three elements to the coverage element of VODataService 1.1. Two of these new elements – the coverage in time and space – are simple floating-point intervals and can be repeated in order to allow non-contiguous coverage. The third element, the spatial coverage, uses a nifty data structure called a MOC, which expands to “HEALPix Multi-Order Coverage map” and is the main reason why I claim we can now pull off STC in the Registry: MOCs let databases and other programs easily and quickly manipulate areas on the sphere. Without MOCs, that's a pain.

    So, if you have registry records somewhere, please add the elements as soon as you can – if you don't know how to make a MOC: CDS' Aladin is there to help. In the end, your coverage elements should look somewhat like this:

    <coverage>
      <spatial>3/336,338,450-451,651-652,659,662-663
        4/1816,1818-1819,1822-1823,1829,1840-1841</spatial>
      <temporal>37190 37250</temporal>
      <temporal>54776 54802</temporal>
      <spectral>3.3e-07 6.6e-07</spectral>
      <spectral>2.0e-05 3.5e-06</spectral>
      <waveband>Optical</waveband>
      <waveband>Infrared</waveband>
    </coverage>
    

    The waveband elements are remainders from VODataService 1.1. They are still in use (prominently, for one, in SPLAT), and it's certainly still a good idea to keep giving them for the forseeable future. You can also see how you would represent multiple observing campaigns and different spectral ranges.

    Finally, if you're running DaCHS and you're using it to generate registry records (and there's almost no excuse for not doing so), you can simply write a coverage element into your RD starting with DaCHS 1.2 (or, if you run betas, 1.1.1, which is already available). You'll find lots of examples at the usual place. As a relatively interesting example, the resource descriptor of plts. It has this:

      <updater spaceTable="data" spectralTable="data" mocOrder="4"/>
      <spectral>3.3e-07 6.6e-07</spectral>
      <temporal>37190 37250</temporal>
      <temporal>38776 38802</temporal>
      <temporal>41022 41107</temporal>
      <temporal>41387 41409</temporal>
      <temporal>41936 41979</temporal>
      <temporal>43416 43454</temporal>
      <spatial>3/282,410 4/40,323,326,329,332,387,390,396,648-650,1083,1085,1087,1101-1103,1123,1125,1132-1134,1136,1138-1139,1144,1146-1147,1173-1175,1216-1217,1220,1223,1229,1231,1235-1236,1238,1240,1597,1599,1614,1634,1636,1728,1730,1737,1739-1740,1765-1766,1784,1786,2803,2807,2809,2812</spatial>
    </coverage>
    

    This particular service archives plate scans from the Palomar-Leiden Trojan surveys; these were looking for Trojan asteroids (of Jupiter) using the Palomar 122 cm Schmidt and were conducted in several shortish campaigns between 1960 and 1977 (incidentally, if you're looking for things near the Ecliptic, this stuff might still hold valuable insights for you). Because the fill factor for the whole time period is rather small, I manually extracted the time coverage; for that, I ran select dateobs from plts.data via TAP and made the histogram plot above. Zooming in a bit, I read off the limits in TOPCAT's coordinate display.

    The other coverages, however, were put in automatically by DaCHS. That's what the updater element does: for each axis, you can say where DaCHS should look, and it will then fill in the appropriate data from what it guesses gives the relevant coordiantes – that's straightforward for standard tables like the ones behind SSAP and SIAP services (or obscore tables, for that matter), perhaps a bit more involved otherwise. To say “just do it for all axis”, give the updater a single sourceTable attribute.

    Finally, in this case I'm overriding mocOrder, the order down to which DaCHS tries to resolve spatial features. I'm doing this here because in determining the coverage of image services DaCHS right now only considers the centers of the images, and that's severely underestimating the coverage here, where the data products are the beautiful large Schmidt plates. Hence, I'm lowering the resolution from the default 6 (about one degree linearly) to still give some approximation to the actual data coverage. We'll fix the underlying deficit as soon as pgsphere, the postgres extension which is actually dealing with all the MOCs, has support for turning circles and polygons into MOCs.

    When you have defined an updater, just run dachs limits q.rd, and DaCHS will carefully (preserving your indentation) re-write the RD to contain what DaCHS has worked out from your table (but careful: it will overwrite what was previously there; so, make sure you only ask DaCHS to only deal with axes you're not dealing with manually).

    If you feel like writing code discovering holes in the intervals, ideally already in the database: that would be great, because the tighter the intervals defined, the fewer false positives people will have in data discovery.

    The take-away for DaCHS operators is:

    1. Add STC coverage to your resources as soon as you've updated to DaCHS 1.2

    2. If you don't have to have the tightest coverage declaration conceivable, all you have to do to have that is add:

      <coverage>
        <updater sourceTable="my_table"/>
      </coverage>
      

      to your RD (where my_table is the id of your service's “main” table) and then run dachs limits q.rd

    3. For special effects and further information, see Coverage Metadata in the DaCHS reference documentation

    4. If you have a nice postgres function that splits a simple coverage interval up so the filling factor of a set of new intervals increases (or know a nice, database-compatible algorithm to do so) – please let me know.

  • Say hello to RegTAP

    image: WIRR in the browser

    GAVO's WIRR registry interface in action to find resources with radio parallaxes.

    RegTAP is one of those standards that a scientist will normally not see – it works in the background and makes, for instance, TOPCAT display the Cone Search services matching some key words. And it's behind the services like WIRR, our Web Interface to the Relational Registry (“Relational Registry” being the official name for RegTAP) that lets you do some interesting data discovery beyond what current clients support. In the screenshot above, for instance (try it yourself), I'm looking for cone search services having parallaxes presumably from radio observations. You could now transmit the services you've found to, say, TOPCAT or your own pyvo-based program to start querying them.

    The key point this query is the use of UCDs – these let services declare fairly unambiguously what kind of physics (if you take that word with a grain of salt) they are talking about. In the example, pos.parallax means, well, a parallax, and the percent character is a wildcard (coming not from UCDs, but from ADQL). That wildcard is a good idea here because without it we might miss things like pos.parallax;obs and pos.parallax;stat.fit that people might have used to distinguish “raw” and ”processed” estimates.

    UCDs are great for data discovery. Really.

    Sometimes, however, clicking around in menus just isn't good enough. That's when you want the full power of RegTAP and write your very own queries. The good news: If you know ADQL (and you should!), you're halfway there already.

    Here's one example of direct RegTAP use I came up with the other day. The use case was discovering data collections that give the effective temperatures of components of binary star systems.

    If you check the UCD list, that “physics” translates into data that has columns with UCDs of phys.temperature and meta.code.multip at the same time. To translate that into a RegTAP query, have a look at the tables that make up a RegTAP service: its ”schema”. Section 8 of the standard lists all the tables there are, and there's an ADASS poster that has an image of the schema with the more common columns illustrated. Oh, and if you're new to RegTAP, you're probably better off briefly studying the examples first to get a feeling for how RegTAP is supposed to work.

    You will find that a pair of ivoid – the VO's global resource identifier – and a per-resource table index uniquely identify a table within the entire registry. So, an ADQL query to pick out all tables containing temperatures and component identifiers would look like this:

    SELECT DISTINCT ivoid, table_index
    FROM
    rr.table_column AS t1
    JOIN rr.table_column AS t2
    USING (ivoid, table_index)
    WHERE t1.ucd='phys.temperature'
    AND t2.ucd='meta.code.multip'
    

    – the DISTINCT makes it so even tables that have lots of temperatures or codes only turn up once in our result set, and the somewhat odd self-join of the rr.table_column table with itself lets us say “make sure the two columns are actually in the same table”. Note that you could catch multi-table resources that define the components in one table and the temperatures in another by just joining on ivoid rather than ivoid and table_index.

    You can run this query on any RegTAP endpoint: GAVO operates a small network of mirrors behind http://reg.g-vo.org/tap, there's the ESAC one at http://registry.euro-vo.org/regtap/tap, and STScI runs one at http://vao.stsci.edu/RegTAP/TapService.aspx. Just use your usual TAP client.

    But granted, the result isn't terribly user-friendly: just identifiers and number. We'd at least like to see the names and descriptions of the tables so we know if the data is somehow relevant.

    RegTAP is designed so you can locate the columns you would like to retrieve or constrain and then just NATURAL JOIN everything together. The table_description and table_name columns are in rr.res_table, so all it takes to see them is to take the query above and join its result like this:

    SELECT table_name, table_description
    FROM rr.res_table
    NATURAL JOIN (
      SELECT DISTINCT ivoid, table_index
      FROM
      rr.table_column AS t1
      JOIN rr.table_column AS t2
      USING (ivoid, table_index)
      WHERE t1.ucd='phys.temperature'
      AND t2.ucd='meta.code.multip') as q
    

    If you try this, you'll see that we'd like to get the descriptions of the resources embedding the tables, too in order to get an idea what we can expect from a given data collection. And if we later want to find services exposing the tables (WIRR is nice for that – try the ivoid constraint –, but for this example all resources currently come from VizieR, so you can directly use VizieR's TAP service to interact with the tables), you want the ivoids. Easy: Just join rr.resource and pick columns from there:

    SELECT table_name, table_description, res_description, ivoid
    FROM rr.res_table
    NATURAL JOIN rr.resource
    NATURAL JOIN (
      SELECT DISTINCT ivoid, table_index
      FROM
      rr.table_column AS t1
      JOIN rr.table_column AS t2
      USING (ivoid, table_index)
      WHERE t1.ucd='phys.temperature'
      AND t2.ucd='meta.code.multip') as q
    

    If you've made it this far and know a bit of ADQL, you probably have all it really takes to solve really challenging data discovery problems – as far as Registry metadata reaches, that is, which currently does not include space-time coverage. But stay tuned, more on this soon.

    In case you're looking for a more systematic introduction into the world of the Registry and RegTAP, there are two... ouch. Can I really link to Elsevier papers? Well, here goes: 2014A&C.....7..101D (a.k.a. arXiv:1502.01186 on the Registry as such and 2015A%26C....11...91D (a.k.a. arXiv:1407.3083) mainly on RegTAP.

Page 1 / 2 »