Posts with the Tag Obscore:

  • Limits, Materialisation, and Anchor Texts: DaCHS 2.13 is out

    AI slop: Ten badgers on a grassy floor.

    With all the crazy Star Trek-sounding talk of “materialising obscore” below I could not resist and asked stabledifffusion.com for „Thirteen badgers materialising obscore“. Well, counting badgers is hard, and I wouldn't have been sure how to visualise obscore, either. Rest assured, though, that the remainder of this post is not AI slop and at least factually correct.

    It's been almost a year since the last release of our publication package, DaCHS, and so it's high time for DaCHS 2.13. I have put it into our repository last Friday, and here is the obligatory post on the major news coming with it.

    Perhaps the biggest headline (and one that I'd ask you to act upon if you run a DaCHS system) is support for the new features in the brand-new VODataService 1.3 Working Draft. That is:

    • Column statistics. This is following my Note on Advanced Column Statistics on the way to improved blind discovery in the VO. To have them in your DaCHS, all you have to do is upgrade and run dachs limits ALL – and then make sure you run dachs limits after a dachs imp you are satisfied with (or use the new -l flag discussed below). Please do it – one can do a lot of interesting discovery in the Registry (and perhaps quite a bit more) if this is taken up broadly.

    • Product type declaration. So far, when you wanted to discover, say, spectra, you would enumerate the SSAP services in the Registry, perhaps with some additional constraints (e.g., on coverage), and then query each of those.

      Linking data types and protocols was a reasonable shortcut in the early VO. It no longer is, for a whole host of reasons, among which Obscore (which can publish any sort of observational data) ranks pretty high up. So, in the future, we need to be explicit on what among the terms from http://www.ivoa.net/rdf/product-type will come out of a service.

      Where this is immediately useful is when you publish time series through SSAP (which is not uncommon). Then, just put:

      <meta name="productType">timeseries</meta>
      

      into the root of your RD (the time series template in 2.13 already does this). If you publish cubes through SIAP, you should similarly say:

      <meta name="productType">cube</meta>
      

      For other SSAP and SIAP services, you probably don't need to bother at this point.

      For obscore, DaCHS will do the declarations for you if you have run:

      dachs limits //obscore
      

      – which is a good thing to do anyway (see above).

    • Data source declaration. For most purposes, it is really important to know whether some piece of data you found is based on actual observations or whether it's data coming out of some sort of simulation.

      So far, the only protocol that let you say something like that was SSAP. But there's now all kinds of other non-observational data in the VO, and so VODataService 1.3 introduces the vocabulary http://www.ivoa.net/rdf/data-source to let you say where the data you publish comes from.

      The default is going to be observational for a long while. If that's what you have, don't bother. But if you publish results from simulations (more or less: starting from random numbers), put:

      <metaName="dataSource">theory</metaName>
      

      into your RD's root, and if it's data based on actual objects (simulated observations for a new instrument, say, or model spectra for concrete stars), make it:

      <metaName="dataSource">artificial</metaName>
      

    To make filling in the VODataService column statistics somewhat less of a hassle, I have added an -l flag to dachs imp. This makes it run (in effect) a dachs limits after the import. I'm not doing this on every import because that would slow down the development of an RD; obtaining the statistics may take quite some time, and for certain sorts of tables you may prefer to run dachs limits with your own options.

    You could argue I should have inverted the logic, where you'd rather pass a flag saying “don't do limits” during development. You could probably convince me. But until someone protests, just remember to add an -l flag to your last import command.

    There are a few more prototypes for (possibly) upcoming standards in DaCHS 2.13. For one, you can now write units in ADQL queries as per my proposal at the Görlitz ADASS. That is, you can annotate literals with units in curly braces (as in 10{pc}), and you can convert values with known units into other units using a new operator @. For instance, if you were fed up with the stupid angle unit we've been forced to accept since… well, about 2000 BC, you could put the interface to saner units into your queries like this:

    SELECT TOP 20
      ra@{rad}, dec@{rad}, pmra@{rad/hyr}, pmdec@{rad/hyr}
    FROM gaia.dr3lite
    

    This is not a big advantage if you write queries just for a single catalogue. It does make a difference when you write queries that ought to work across multiple tables and services.

    While you should not notice the per-mode limit declarations coming from an unpublished draft of TAPRegExt 1.1 (except that the async limits TOPCAT shows will now better match what DaCHS actually enforces), you could appreciate the support for StaticFile that comes out of DocRegExt 1.0. There, it is used to register single PDF files or perhaps ipython notebooks. When you register such things[1], you can now say something like:

    <publish render="edition" sets="ivo_managed">
      <meta>
        accessURL: \internallink{\rdId/static/myfile.txt}
        accessURL.resultType: text/plain
      </meta>
    </publish>
    

    The result of this will be that DaCHS produces a doc:StaticFile interface rather than vs:WebBrowser, and it will produce a resultType element saying that what you get back is plain text (in this case). If you have other applications for having static files like that in registry records, do let me know.

    My investigation into slow obscore queries I already reported on here led to two changes: For one, some types in the obscore table changed, and in consequence dachs val -vc ALL will complain when you pulled in the obscore columns into your own tables. Just try the val -vc and either re-import the affected resources at your leisure (it's only an aesthetic defect, things will continue to work) or change the column types as described in the blog post linked above.

    Probably more importantly, you can now materialise the obscore view (actually, in order to let you drop the contributing tables at will, it's not a materialised view but a table, but that's… immaterial here). You want to do that if you have many contributions to your obscore table, at least some queries against it become slow and you can't seem to figure out why. See Materialised Obscore in the tutorial to see what to do if you want to materialise your obscore table, too.

    Something perhaps worth exploring for you is that you can now publish entire RDs. I implemented this for a resource with lots of little “services” (actually, HiPSes) that share so many pieces of metadata that it just seemed wrong to have them all separate resource records (though I am in discussion with the HiPS people who are not particularly fond of having multiple HiPSes in one resource record), nsns. Beyond that, you could have, say, a cone search for extracted sources, an image service and a browser service for both in one RD and then say, in the RD section with top-level metadata:

    <publish sets="ivo_managed"/>
    

    – everything should then live nicely as separate capabilities within one resource record and that without any of the publish/@service tomfoolery you had to use so far to glue together VO and browser services.

    For local publications (i.e., browser services appearing on your front page), this will result in a link to the RD info (minor DaCHS secret: <your server URL>/browse/<rd-id> gives an overview over the tables and services defined in an RD). Whether that's useful enough for you in such a case I cannot predict. But you can mix all-RD publications in ivo_managed with conventional <publish sets="local"/> elements for browser services.

    Among the more minor changes, the default web form template now employs a WebSAMP connector, which means that the SAMP button on results of the form renderer is now greyed out until a SAMP hub becomes visible on your machine.

    If you use a display hint type=url, you can now control the anchor text on the a element in HTML output by setting a property anchorText on the corresponding column. Yes, that will then be constant for all the products. If you really need more control than that, you will have to define a formatter for a custom outputField.

    So far, the fullDLURL macro could only be used when you actually had a normal, filename-based DaCHS access reference. This was unfortunate because this kind of thing is particularly convenient for “virtual” data generated on the fly. Hence, you can now pass some python code in a second fullDLURL argument that must return the accref to use. Read a bit more on the context in Datalinks as Product URLs.

    There are many other minor changes and fixes that you hopefully will only notice because some annoying behaviour of DaCHS is now a little less annoying.

    If you spot problems or miss something, feel free to report that at our new repository at Codeberg. The main VCS for DaCHS still is https://gitlab-p4n.aip.de/gavo/dachs. But we will probably migrate to Codeberg by the 2.14 release to make reporting bugs and writing pull requests simpler.

    Perhaps we will receive some from you?

    [1]Using resType: document; I notice I should really add some material on registering educational material with DaCHS to the tutorial.
  • APPLAUSE via Obscore

    A composite of two rather noisy photo plates

    Aladin showing some Bamberg Sky Patrol plates (see towards the end of the post for what this is and how I made it).

    At the Astroplate conference I blogged about recently, the people behind APPLAUSE gave a couple of talks about their Data Release 3. APPLAUSE is a fairly massive endeavour to make available data from some of the larger plate archives in Germany, and its DR3 even hit the non-Astronomy press last February.

    Already for previous APPLAUSE releases, I've wanted to bring this data (or rather, its metadata) to the VO, but it never quite happened, basically because there was always another little thing that turned out to be too tedious to work out via mail. However, working out things interactively is exactly what conferences are great for. So, the kind APPLAUSE folks (thanks, Taavi and Harry) and I used the Astroplate to map their database schema (“schema” is jargon for what boils down to the set of tables and columns with which they describe their data) to the much simpler (and, admittedly, less powerful) IVOA Obscore one.

    Sure, Obscore doesn't deal with multiple exposures (like when the target field and the north pole were exposed on one plate to help precision photometry), object-guided images, and all the other interesting techniques that astronomers applied in the pre-digital age; it also doesn't usefully cope with multiple scans of the same plate (for instance, to correct for imprecisions in the mechanics of flatbed scanners). APPLAUSE, of course, has to cope with them, since there are many reasons to preserve data of this kind.

    Obscore, on the other hand, is geared towards uniform discovery, where too funky datasets in all likelihood cause more harm than good. So, when we mapped APPLAUSE to Obscore, of the 101138 scans of 70276 plates that the full APPLAUSE holds in DR3, only 44000 plate scans made it into the Obscore table. The advantage: whatever can be sensibly mapped to Obscore can now be queried together with all the other data in the world that others have published through Obscore.

    You can immediately see the effect when you run the little python program doing the global discovery we gave in our plates tutorial. Here's what it prints now (values from pre-APPLAUSE-in-Obscore are in square brackets):

    Column t_exptime: 3460 values
      Min   12, Max 15300, Mean 890.24  [previous mean: 370.722]
    ---
    Column em_mean: 3801 values
      Min 1.8081e-09, Max 9.3e-07, Mean 6.40804e-07 [No change: Sigh!]
    ---
    Column t_mean: 4731 values
      Min 12564.5, Max 58126.3, Mean 49897.9 [previous mean: 51909.1]
    ---
    Column instrument_name: 4747 values
      Matches from , Petzval, [Max Wolf's residence in
      Heidelberg, Maerzgasse, Wolf's Doppelastrograph,
      Heidelberg Koenigstuhl (24), Wolf's
      Doppelastrograph,] AG-Astrograph, [Zeiss Triplet
      15 cm Potsdam-Telegrafenberg], Zeiss Triplet,
      Astrograph (four 10-cm Tessar f/6 cameras),
      [3.5m APO, ROSAT PSPCC, Heidelberg Koenigstuhl
      (24), Bruce Astrograph, Calar Alto (493),
      Schmidt], Grosser Refraktor, [ROSAT HRI,
      DK-1.54], Hamburger Schmidt-Spiegel,
      [DFOSC_FASU], ESO 1-metre Schmidt telescope,
      Great Schmidt Camera, Lippert-Astrograph, Ross-B
      3", [AZT 22], Astrograph (six 10-cm Tessar f/6
      cameras), 1m-Spiegelteleskop, [ROSAT PSPCB],
      Astrograph (ten 10-cm Tessar f/6 cameras), Zeiss
      Objective
    ---
    Column access_url: 4747 values [4067]
    

    So – for the fields selected in the tutorial, there are 15% more images in the global Obscore image pool now than there were before APPLAUSE, and their mean observation date went a bit farther into the past. I've not made any statistics, but I suspect for many other fields the gain is going to be much higher. For a strong effect, try some random region covered by the Bamberg Sky Patrol on the southern sky.

    But you have probably noticed the deep sigh in the annotations to the statistics above: Yes, we don't have the spectral band for the APPLAUSE data, which is why the stats on em_min doesn't change. As a matter of fact, from the Obscore data you can't even guess whether a plate is “more red” or “rather blue”, as Obscore doesn't have an (agreed-upon) field for “qualititive bandpass indicator”.

    For some other data collections, we did map known emulsion/filter combinations to rough bandpasses (e.g., the Palomar-Leiden Trojan Survey, which only had a few of them). For APPLAUSE, there are 435 combinations of filter and emulsion (that's a VOTable link that you can paste into TOPCAT's load button in order to have a look at the table). Granted, quite a few of these pairs are (more or less) spurious because of inconsistent spelling. But we still gave up on researching the bandpasses even before we started.

    If you're a photographic plate buff: You could help us and posteriority a lot if you could go through this list and at least for some combinations tell us what, roughly, the lower and upper limits of the corresponding bandpasses might have been (what DaCHS already knows, plate-relevant data near the bottom of the file). As usual, send mail to gavo@ari.uni-heidelberg.de if you have anything to contribute.

    Finally, here's the brief explanation of the image for this article: Well, I wanted to find some Bamberg Sky Patrol images for a single field to play with. I knew they were primarily located in the South, and were made using Tessar cameras. So, I ran:

    SELECT t_min, access_url, s_region
    FROM ivoa.obscore
    WHERE instrument_name like '%Tessar%'
    AND 1=CONTAINS(POINT(345, -38), s_region)
    

    on GAVO's TAP service. Since Aladin 10, you can do that from within the program (although some versions will reject this query because they mistakenly believe the ADQL is bad. Query through TOPCAT and send the result over to Aladin if that bites you). Incidentally, when there are s_region values in Obscore tables, it's a good idea to use them as I do here, as it's quite a bit more likely that this query will use indices than some condition on s_ra and s_dec. But then not all services fill s_region properly, so for all-VO queries you will probably want to make do with s_ra and s_dec.

    From that result I first made the inset bar graph in the article image to show the temporal distribution of the Patrol plates. And then I grabbed two (rather randomly selected) plates and had Aladin produce a red-blue composite of them. Whatever is really red or really blue in that image may correspond to a transient event. Or, as certainly the case with that little hair (or whatever) that shines out in blue, it may not.

Page 1 / 1