Posts with the Tag ADQL:

  • Space and Time not lost on the Registry

    Histogram: observation dates of an image service

    A histogram of times for which the Palomar-Leiden service has images: That's temporal service coverage right there.

    If you are an astronomer and you've ever tried looking for data in the Virtual Observatory Registry, chances are you have wondered “Why can't I enter my position here?” Or perhaps “So, I'm looking for images in [NIII] – where would I go?”

    Both of these are examples for the use of Space-Time Coordinates (STC) in data discovery – yes, spectral coordinates count as STC, too, and I could make an argument for it. But this post is about something else: None of this has worked in the Registry up to now.

    It's time to mend this blatant omission. To take the next steps, after a bit of discussion on some of the IVOA's mailing lists, I have posted an IVOA note proposing exactly those last Thursday. It is, perhaps with a bit of over-confidence, called A Roadmap for Space-Time Discovery in the VO Registry. And I'd much appreciate feedback, in particular if you are a VO user and have ideas on what you'd like to do with such a facility.

    In this post, I'd like to give a very quick run-down on what is in it for (1) VO users, (2) service operators in general, and (3) service operators who happen to run DaCHS.

    First, users. We already are pretty good on spatial coverage (for about 13000 of almost 20000 resources), so it might be worth experimenting with that. For now, the corresponding table is only available on the RegTAP mirror at http://dc.g-vo.org/tap. There, you can try queries like:

    select ivoid from
    rr.table_column
    natural join rr.stc_spatial
    where
      1=contains(gavo_simbadpoint('HDF'), coverage)
      and ucd like 'phot.flux;em.radio%'
    

    to find – in this case – services that have radio fluxes in the area of the Hubble Deep Field. If these lines scare you or you don't know what to do with the stupid ivoids, check the previous post on this blog – it explains a bit more about RegTAP and why you might care.

    Similarly cool things will, hopefully, some day be possible in spectrum and time. For instance, if you were interested in SII fluxes in the crab nebula in the early sixties, you could, some day, write:

    SELECT ivoid FROM
    rr.stc_temporal
    NATURAL JOIN rr.stc_spectral
    NATURAL JOIN rr.stc_spatial
    WHERE
      1=CONTAINS(gavo_simbadpoint('M1'), coverage)
      AND 1=ivo_interval_overlaps(
        6.69e-7, 6.75e-7,
        wavelength_start, wavelength_end)
      AND 1=ivo_interval_overlaps(
        36900, 38800,
        time_start, time_end)
    

    As you can see, the spectral coordiate will, following (admittedly broken) VO convention, be given in meters of vacuum wavelength, and time in MJD. In particular the thing with the wavelength isn't quite settled yet – personally, I'd much rather have energy there. For one, it's independent of the embedding medium, but much more excitingly, it even remains somewhat sensible when you go to non-electromagnetic messengers.

    A pattern I'm trying to establish is the use of the user-defined function ivo_interval_overlaps, also defined in the Note. This is intended to allow robust query patterns in the presence of two intrinsically interval-valued things: The service's coverage and the part of the spectrum you're interested in, say. With the proposed pattern, either of these can degenerate to a single point and things still work. Things only break when both the service and you figure that “Aw, Hα is just 656.3 nm” and one of you omits a digit or adds one.

    But that's academic at this point, because really few resources define their coverage in time and and spectrum. Try it yourself:

    SELECT COUNT(*) FROM (
      SELECT DISTINCT ivoid FROM rr.stc_temporal) AS q
    

    (the subquery with the DISTINCT is necessary because a single resource can have multiple rows for time and spectrum when there's multiple distinct intervals – think observation campaigns). If this gives you more than a few dozen rows when you read this, I strongly suspect it's no longer 2018.

    To improve this situation, the service operators need to provide the information on the coverage in their resource records. Indeed, the registry schemas already have the notion of a coverage, and the Note, in its core, simply proposes to add three elements to the coverage element of VODataService 1.1. Two of these new elements – the coverage in time and space – are simple floating-point intervals and can be repeated in order to allow non-contiguous coverage. The third element, the spatial coverage, uses a nifty data structure called a MOC, which expands to “HEALPix Multi-Order Coverage map” and is the main reason why I claim we can now pull off STC in the Registry: MOCs let databases and other programs easily and quickly manipulate areas on the sphere. Without MOCs, that's a pain.

    So, if you have registry records somewhere, please add the elements as soon as you can – if you don't know how to make a MOC: CDS' Aladin is there to help. In the end, your coverage elements should look somewhat like this:

    <coverage>
      <spatial>3/336,338,450-451,651-652,659,662-663
        4/1816,1818-1819,1822-1823,1829,1840-1841</spatial>
      <temporal>37190 37250</temporal>
      <temporal>54776 54802</temporal>
      <spectral>3.3e-07 6.6e-07</spectral>
      <spectral>2.0e-05 3.5e-06</spectral>
      <waveband>Optical</waveband>
      <waveband>Infrared</waveband>
    </coverage>
    

    The waveband elements are remainders from VODataService 1.1. They are still in use (prominently, for one, in SPLAT), and it's certainly still a good idea to keep giving them for the forseeable future. You can also see how you would represent multiple observing campaigns and different spectral ranges.

    Finally, if you're running DaCHS and you're using it to generate registry records (and there's almost no excuse for not doing so), you can simply write a coverage element into your RD starting with DaCHS 1.2 (or, if you run betas, 1.1.1, which is already available). You'll find lots of examples at the usual place. As a relatively interesting example, the resource descriptor of plts. It has this:

      <updater spaceTable="data" spectralTable="data" mocOrder="4"/>
      <spectral>3.3e-07 6.6e-07</spectral>
      <temporal>37190 37250</temporal>
      <temporal>38776 38802</temporal>
      <temporal>41022 41107</temporal>
      <temporal>41387 41409</temporal>
      <temporal>41936 41979</temporal>
      <temporal>43416 43454</temporal>
      <spatial>3/282,410 4/40,323,326,329,332,387,390,396,648-650,1083,1085,1087,1101-1103,1123,1125,1132-1134,1136,1138-1139,1144,1146-1147,1173-1175,1216-1217,1220,1223,1229,1231,1235-1236,1238,1240,1597,1599,1614,1634,1636,1728,1730,1737,1739-1740,1765-1766,1784,1786,2803,2807,2809,2812</spatial>
    </coverage>
    

    This particular service archives plate scans from the Palomar-Leiden Trojan surveys; these were looking for Trojan asteroids (of Jupiter) using the Palomar 122 cm Schmidt and were conducted in several shortish campaigns between 1960 and 1977 (incidentally, if you're looking for things near the Ecliptic, this stuff might still hold valuable insights for you). Because the fill factor for the whole time period is rather small, I manually extracted the time coverage; for that, I ran select dateobs from plts.data via TAP and made the histogram plot above. Zooming in a bit, I read off the limits in TOPCAT's coordinate display.

    The other coverages, however, were put in automatically by DaCHS. That's what the updater element does: for each axis, you can say where DaCHS should look, and it will then fill in the appropriate data from what it guesses gives the relevant coordiantes – that's straightforward for standard tables like the ones behind SSAP and SIAP services (or obscore tables, for that matter), perhaps a bit more involved otherwise. To say “just do it for all axis”, give the updater a single sourceTable attribute.

    Finally, in this case I'm overriding mocOrder, the order down to which DaCHS tries to resolve spatial features. I'm doing this here because in determining the coverage of image services DaCHS right now only considers the centers of the images, and that's severely underestimating the coverage here, where the data products are the beautiful large Schmidt plates. Hence, I'm lowering the resolution from the default 6 (about one degree linearly) to still give some approximation to the actual data coverage. We'll fix the underlying deficit as soon as pgsphere, the postgres extension which is actually dealing with all the MOCs, has support for turning circles and polygons into MOCs.

    When you have defined an updater, just run dachs limits q.rd, and DaCHS will carefully (preserving your indentation) re-write the RD to contain what DaCHS has worked out from your table (but careful: it will overwrite what was previously there; so, make sure you only ask DaCHS to only deal with axes you're not dealing with manually).

    If you feel like writing code discovering holes in the intervals, ideally already in the database: that would be great, because the tighter the intervals defined, the fewer false positives people will have in data discovery.

    The take-away for DaCHS operators is:

    1. Add STC coverage to your resources as soon as you've updated to DaCHS 1.2

    2. If you don't have to have the tightest coverage declaration conceivable, all you have to do to have that is add:

      <coverage>
        <updater sourceTable="my_table"/>
      </coverage>
      

      to your RD (where my_table is the id of your service's “main” table) and then run dachs limits q.rd

    3. For special effects and further information, see Coverage Metadata in the DaCHS reference documentation

    4. If you have a nice postgres function that splits a simple coverage interval up so the filling factor of a set of new intervals increases (or know a nice, database-compatible algorithm to do so) – please let me know.

  • Say hello to RegTAP

    image: WIRR in the browser

    GAVO's WIRR registry interface in action to find resources with radio parallaxes.

    RegTAP is one of those standards that a scientist will normally not see – it works in the background and makes, for instance, TOPCAT display the Cone Search services matching some key words. And it's behind the services like WIRR, our Web Interface to the Relational Registry (“Relational Registry” being the official name for RegTAP) that lets you do some interesting data discovery beyond what current clients support. In the screenshot above, for instance (try it yourself), I'm looking for cone search services having parallaxes presumably from radio observations. You could now transmit the services you've found to, say, TOPCAT or your own pyvo-based program to start querying them.

    The key point this query is the use of UCDs – these let services declare fairly unambiguously what kind of physics (if you take that word with a grain of salt) they are talking about. In the example, pos.parallax means, well, a parallax, and the percent character is a wildcard (coming not from UCDs, but from ADQL). That wildcard is a good idea here because without it we might miss things like pos.parallax;obs and pos.parallax;stat.fit that people might have used to distinguish “raw” and ”processed” estimates.

    UCDs are great for data discovery. Really.

    Sometimes, however, clicking around in menus just isn't good enough. That's when you want the full power of RegTAP and write your very own queries. The good news: If you know ADQL (and you should!), you're halfway there already.

    Here's one example of direct RegTAP use I came up with the other day. The use case was discovering data collections that give the effective temperatures of components of binary star systems.

    If you check the UCD list, that “physics” translates into data that has columns with UCDs of phys.temperature and meta.code.multip at the same time. To translate that into a RegTAP query, have a look at the tables that make up a RegTAP service: its ”schema”. Section 8 of the standard lists all the tables there are, and there's an ADASS poster that has an image of the schema with the more common columns illustrated. Oh, and if you're new to RegTAP, you're probably better off briefly studying the examples first to get a feeling for how RegTAP is supposed to work.

    You will find that a pair of ivoid – the VO's global resource identifier – and a per-resource table index uniquely identify a table within the entire registry. So, an ADQL query to pick out all tables containing temperatures and component identifiers would look like this:

    SELECT DISTINCT ivoid, table_index
    FROM
    rr.table_column AS t1
    JOIN rr.table_column AS t2
    USING (ivoid, table_index)
    WHERE t1.ucd='phys.temperature'
    AND t2.ucd='meta.code.multip'
    

    – the DISTINCT makes it so even tables that have lots of temperatures or codes only turn up once in our result set, and the somewhat odd self-join of the rr.table_column table with itself lets us say “make sure the two columns are actually in the same table”. Note that you could catch multi-table resources that define the components in one table and the temperatures in another by just joining on ivoid rather than ivoid and table_index.

    You can run this query on any RegTAP endpoint: GAVO operates a small network of mirrors behind http://reg.g-vo.org/tap, there's the ESAC one at http://registry.euro-vo.org/regtap/tap, and STScI runs one at http://vao.stsci.edu/RegTAP/TapService.aspx. Just use your usual TAP client.

    But granted, the result isn't terribly user-friendly: just identifiers and number. We'd at least like to see the names and descriptions of the tables so we know if the data is somehow relevant.

    RegTAP is designed so you can locate the columns you would like to retrieve or constrain and then just NATURAL JOIN everything together. The table_description and table_name columns are in rr.res_table, so all it takes to see them is to take the query above and join its result like this:

    SELECT table_name, table_description
    FROM rr.res_table
    NATURAL JOIN (
      SELECT DISTINCT ivoid, table_index
      FROM
      rr.table_column AS t1
      JOIN rr.table_column AS t2
      USING (ivoid, table_index)
      WHERE t1.ucd='phys.temperature'
      AND t2.ucd='meta.code.multip') as q
    

    If you try this, you'll see that we'd like to get the descriptions of the resources embedding the tables, too in order to get an idea what we can expect from a given data collection. And if we later want to find services exposing the tables (WIRR is nice for that – try the ivoid constraint –, but for this example all resources currently come from VizieR, so you can directly use VizieR's TAP service to interact with the tables), you want the ivoids. Easy: Just join rr.resource and pick columns from there:

    SELECT table_name, table_description, res_description, ivoid
    FROM rr.res_table
    NATURAL JOIN rr.resource
    NATURAL JOIN (
      SELECT DISTINCT ivoid, table_index
      FROM
      rr.table_column AS t1
      JOIN rr.table_column AS t2
      USING (ivoid, table_index)
      WHERE t1.ucd='phys.temperature'
      AND t2.ucd='meta.code.multip') as q
    

    If you've made it this far and know a bit of ADQL, you probably have all it really takes to solve really challenging data discovery problems – as far as Registry metadata reaches, that is, which currently does not include space-time coverage. But stay tuned, more on this soon.

    In case you're looking for a more systematic introduction into the world of the Registry and RegTAP, there are two... ouch. Can I really link to Elsevier papers? Well, here goes: 2014A&C.....7..101D (a.k.a. arXiv:1502.01186 on the Registry as such and 2015A%26C....11...91D (a.k.a. arXiv:1407.3083) mainly on RegTAP.

  • The Earth is Our Telescope

    Antares 2007-2012 neutrino coverage

    The coverage of the 2007-2012 Antares neutrinos, with positional uncertainties scaled by three.

    At our Heidelberg data center, we have have already published some neutrino data, for instance the Amanda-II neutrino candiates, the IceCube-40 neutrino candidates, and the 2007-2010 Antares results.

    That latter project has now given us updated data, for the first time including timestamps, available as the Antares service.

    Now, if you look at the coverage (above), you'll notice at least two things: For one, there's no data around the north pole. That's because the instrument sits beyond the waters of the Mediterranean sea, not far from where some of you may now enjoy your vacation. And it is using the Earth as its filter – it's measuring particles as they come ”up” and discards anything that goes “down”. Yes, neutrinos are strange beasts.

    The second somewhat unusual thing is that the positional uncertainties are huge compared to what we're used to from optical catalogs: a degree is not uncommon (we've scaled the error circles by a factor of 3 in the image above, though). And that requires some extra care when working with the data.

    In our table, we have a column origin_est that actually contains circles. Hence, to find images of the “strongest” neutrinos in our obscore table, you could write:

    SELECT * FROM
    ivoa.obscore AS o
    JOIN (
      SELECT top 10 * FROM antares.data
      ORDER BY n_hits desc
    ) AS n
    ON 1=INTERSECTS(
      s_region,
      origin_est)
    

    in a query to our TAP service.

    But of course, this only gets really exciting when you can hope that perhaps that neutrino was emitted by some violent event that may have been observed serendipitously by someone else. That query then is (and we're using all the neutrinos now):

    SELECT * FROM
    ivoa.obscore AS o
    JOIN antares.data as n
    ON
       epoch_mjd between t_min-0.01 and t_max+0.01
      AND
        dataproduct_type='image'
      AND
        1=INTERSECTS(origin_est, s_region)
    

    On our data center, this doesn't yield anything at the moment (it does, though, if you do away with the spatial constraint, which frankly suprised me a bit). But then if you went and ran this query against obscore services of active observatories? And perhaps had your computer try and figure out whether anything unusual is seen on whatever you find?

    We think that would be really nifty, and right after we've published a first version of our little pyVO course (which is a bit on the back burner, but watch this space), we'll probably work that out as a proper pyVO use case.

    And meanwhile: In case you'll be standing on the shores of the Mediterranean this summer, enjoy the view and think of the monster deep down in there waiting for neutrinos to detect – and eventually drop into our data center.

  • ADQL tricks at MPIA

    Aerial image of Heidelberg and Königstuhl

    The 2017-06-29 ADQL talk (red circle) from 30000 ft

    Today I was up on Heidelberg's signature mountain, Königstuhl, at the Max-Planck-Institute for Astronomy for a little talk on what I'd provisionally call “intermediate ADQL” – discussing some aspects of ADQL and some TAP techniques that may not be immediately obvious but still generally and straightforwardly applicable to everyday problems. Since I suspect the lecture notes for that talk may be of interest to some readers of this blog, I thought I should share them here.

    What this also contains is a very quick piece of pyVO-based python (which needs both this helper and a recent pyVO) for a use case that comes up fairly often: “Give me all proper motions (radio fluxes, distances, radial velocities, whatever) for object in this region.”

    This uses a discovery case I've been after for quite a while now: Find services by the UCDs of tables within them. And while that's been possible for quite a while on GAVO's Registry UI WIRR, there's still too many services that don't declare their tables to the Registry, and when talking about TAP, the situation is still a bit worse (as has been mentioned in my account of the last interop). So – enjoy the code, but very frankly, you'll still see wires sticking out for a several months yet.

    And if you run a TAP service yourself, please have a look at how to enable table discovery over on the IVOA wiki so we can finally get those pesky wires out of our users' eyes.

  • See Who's Kinking the Sky

    A new arrival in the GAVO Data Center is UCAC5, another example of a slew of new catalogs combining pre-existing astrometry with Gaia DR1, just like the HSOY catalog we've featured here a couple of weeks back.

    That's a nice opportunity to show how to use ADQL's JOIN operator for something else than the well-known CONTAINS-type crossmatch. Since both UCAC5 and HSOY reference Gaia DR1, both have, for each object, a notion which element of the Gaia source catalog they correspond to. For HSOY, that's the gaia_id column, in UCAC5, it's just source_id. Hence, to compare results from both efforts, all you have to do is to join on source_id=gaia_id (you can save yourself the explicit table references here because the column names are unique to each table.

    So, if you want to compare proper motions, all you need to do is to point your favourite TAP client's interface to http://dc.g-vo.org/tap and run:

    SELECT
        in_unit(avg(uc.pmra-hsoy.pmra), 'mas/yr') AS pmradiff,
        in_unit(avg(uc.pmde-hsoy.pmde), 'mas/yr') AS pmdediff,
        count(*) as n,
        ivo_healpix_index (6, raj2000, dej2000) AS hpx
        FROM hsoy.main AS hsoy
        JOIN ucac5.main as uc
        ON (uc.source_id=hsoy.gaia_id)
        WHERE comp IS NULL    -- hsoy junk filter
        AND clone IS NULL     -- again, hsoy junk filter
        GROUP BY hpx
    

    (see Taylor et al's All of the Sky if you're unsure what do make of the healpix/GROUP BY magic).

    Of course, the fact that both tables are in the same service helps, but with a bit of upload magic you could do about the same analysis across TAP services.

    Just so there's a colourful image in this post, too, here's what this query shows for the differences in proper motion in RA:

    (equatorial coordinates, and the aux axis is a bit cropped here; try for yourself to see how things look for PM in declination or when plotted in galactic coordinates).

    What does this image mean? Well, it means that probably both UCAC5 and HSOY would still putt kinks into the sky if you wait long enough.

    In the brightest and darkest points, if you waited 250 years, the coordinate system induced by each catalog on the sky would be off by 1 arcsec with respect to the other (on a sphere, that means there's kinks somewhere). It may seem amazing that there's agreement to at least this level between the two catalogs – mind you, 1 arcsec is still more than 100 times smaller than you could see by eye; you'd have to go back to the Mesolithic age to have the slightest chance of spotting the disagreement without serious optical aids. But when Gaia DR2 will come around (hopefully around April 2018), our sky will be more stable even than that.

    Of course, both UCAC5 and HSOY are, indirectly, standing on the shoulders of the same giant, namely Hipparcos and Tycho, so the agreement may be less surprising, and we strongly suspect that a similar image will look a whole lot less pleasant when Gaia has straightened out the sky, in particular towards weaker stars.

    But still: do you want to bet if UCAC5 or HSOY will turn out to be closer to a non-kinking sky? Let us know. Qualifications („For bright stars...”) are allowed.

« Page 6 / 7 »