Posts with the Tag ADQL:

  • LAMOST5 meets Datalink

    One of the busiest spectral survey instruments operated right now is the Large Sky Area Multi-Object Fiber Spectrograph Telescope (LAMOST). And its data in the VO, more or less: DR2 and DR3 have been brought into the VO by our Czech colleagues, but since they currently lack resources to update their services to the latest releases, they have kindly given me their DaCHS resource descriptor, and so I had a head start for publishing DR5 in Heidelberg.

    With some minor updates, here it is now: Over nine million medium-resolution spectra covering large parts of the northen sky – the spatial coverage is like this:

    Coverage Healpix map

    There's lots of fun to be had with this; of course, there's an SSA service, so when you point Aladin or Splat at some part of the covered sky and look for spectra, chances are you'll see LAMOST spectra, and when working on some of our tutorials (this one, for example), it happened that LAMOST actually had what I was looking for when writing them.

    But I'd like to use the opportunity to mention two other modes of accessing the data.

    Stacked spectra

    Tablesample and TOPCAT's Plot Table activation action

    Say you'd like to look at spectra of M stars and would like to have some sample from across the sky, fire up TOPCAT, point its TAP client the GAVO DC TAP service (http://dc.g-vo.org/tap) and run something like:

    select
      ssa_pubDID, accref, raj2000, dej2000, ssa_targsubclass
    from lamost5.data tablesample(1)
    where
      ssa_targsubclass like 'M%'
    

    This is using the TABLESAMPLE modifier in the from clause, which isn't standard ADQL yet. As mentioned in the DaCHS 1.4 announcement, DaCHS has a prototype implementation of what's been discussed on the IVOA's DAL mailing list: pick a part of a table rather than the full one. It takes a percentage as an argument, and tells the server to choose about this percentage of the table's records using a reasonable and fast heuristic. Note that this won't give you perfect statistical sampling, but if it's not “good enough” for some purpose, I'd like to learn about that purpose.

    Drawing a proper statistical sample, on the other hand, would take minutes on the GAVO database server – with tablesample, I had the roughly 6000 spectra the above query returns essentially instantaneously, and from eyeballing a sky plot of them, I'd say their distribution is close enough to that of the full DR5. So: tablesample is your friend.

    For a quick look at the spectra themselves, in TOPCAT click Views/Activation Actions, check “Plot Table” and make sure TOPCAT proposes the accref column as “Table Location” (if you don't see these items, update your TOPCAT – it's worth it). Now click on a row or perhaps a dot on a plot and behold an M spectrum.

  • DaCHS 1.4 is out

    Dachs logo with "version 1.4" superposed

    Since the Groningen Interop is over, it's time for a DaCHS release, and so, roughly half a year after the release of DaCHS 1.3, today I've pushed DaCHS 1.4 into our Debian repository.

    As usual, you should upgrade as soon as you find time to do so, because upgrades become more difficult if they span large version gaps; and one of these days you will need some new feature or run into one of the odd bugs. Upgrading is a good opportunity to also get your DaCHS ready for buster by adding the repos mentioned there.

    The list of new features is rather short this time around. Here are some noteworthy ones:

    • There's now an XML grammar that can be used when you have to parse smallish snippets of XML as, for instance, in VOEvent.
    • You can now use TABLESAMPLE(1) after a table specification in DaCHS' ADQL to tell the database engine to just use 1% of a table for a query. While this isn't a precise way to sample tables, it's great when developing queries.
    • Also among new features I'd like to see in ADQL and have therefore put into DaCHS is GENERATE_SERIES(a,b), which is what is known as table-generating function in SQL . If you know SDSS CasJobs, you'll have seen lots of those already. GENERATE_SERIES, however, is really plain: it just spits out a table with a column with integers between a and b. For an example of why one might what to have that, check out the poster I'm linking to in my ADASS report.
    • If you have an updating data descriptor (usually, because you keep feeding data into a data collection), DaCHS will no longer automatically re-make its dependencies (like, say, views). That's because that's not necessary in general, and it's a pain if every update on an obscore-published table tears down and rebuilds the obscore view. For the rare cases when you do need to rebuild dependencies, there's now a remakeOnDataChange attribute on data.
    • At the interop, I've mentioned a few use cases for knowing which server software you're talking to, and I've said that people should set their server headers to informative values. DaCHS does that now.

    To conclude on a low note: This is probably going to be the last release of DaCHS for python 2. Even though we will have to shed a dependency or two that simply will not be ported to python 3, and even though I'm rather unhappy with a few properties of the python 3 port of twisted, there's probably no way to escape this, given that Debian is purging out python 2 packages quickly already.

    So, when we meet again for the next release, you'll probably be looking at DaCHS 2.0, and where you have custom code in your RDs, it's rather likely that you'll see a minor amount of breakage. I promise I'll do everything I can to make the migration easy for deployers, but I can't do higher magic, so: If there's ever been a time to add regression tests to your RDs, it's now.

  • ADASS and Interop

    ADASS group photo

    ADASS XXIX is a big conference with lots of attendants. I've taken the liberty of scaling the photo so you really won't recognise me (though I am on the photo). Note that, regrettably, the interop will be a lot smaller.

    The people that create the Virtual Observatory standards, organised in the IVOA, meet twice a year: Once in spring for a five-day meeting (this year it happened in Paris), and once in autumn for a three-day meeting back-to-back to ADASS, the venerable (this year it's the 29th installment) meeting of people dealing with astronomy and computers.

    We're now on day three of ADASS, and for me, so far this has been more or an endless hackathon, with discussing and hacking on things like mirrors for DFBS, ADQL 2.1, the evolution of IVOA vocabularies (more on this soon somewhere around here), a vocabulary of object types, getting LAMOST 5 published properly in the VO, the measurements data model, convincing more registries to push out space-time coverage for their resources (I'm showing a poster on that), and a lot more.

    So, getting to actually listen to talks during ADASS almost is something of a luxury, and a mind-widening at that – I've just listend to a talk about effectively doubling the precision of VLBI geodesy (in this case, measuring the location of radio telescopes to a few millimeters) by a piece of clever software, and before that I could learn a bit about how complex it is to figure out how much interference something emitting radio waves will cause in some other place on earth (like, well, a radio telescope). In case you're curious: A bit more than a year from now, short papers on the topics will appear in the proceedings of ADASS XXIX, which in turn you'll find in the ADASS proceedings collections (or on arXiv before that).

    Given the experience of the last few days, I doubt I'll do anything like the live blog from Paris linked above. I still can't resist mentioning that at ADASS, I'm having a poster that's little more than an ad blitz for STC in the registry.

    Update (2019-10-13): Well, one week later I'm sitting in the closing session of the Interop, and I've even already given my summary of Semantics activities during the interop. Other topics I've talked about at this interop include interoperable authentication (I'm really interested in this because I'd like to enable persistent TAP uploads, where your uploaded tables are still there for you when you come back), a minor update to SimpleDALRegExt (which is overall rather technical and you probably don't want to look at), on the takeup of new Registry tech (which might come over as somewhat sad, but considering that you have to pull along many people to have changes in “the” Registry, it's not so bad at all), and on, as Mark Taylor called it, operational identification of server software (which I consider entertaining in its somewhat erratic narrative).

    And now, after 7 days of essential nonstop discussion and brainstorming, I'm longing to slump into a chair on the train back to Heidelberg and just enjoy the landscape rolling by.

  • ADQL Traps #1: NULL

    NULL is a difficult concept. Not only in SQL

    NULL is a difficult concept. Not only in SQL

    I recently got embarrassed by ADQL NULLs, i.e., the magic value indicating that a value in a given column is missing. And since that's a common source of errors when writing ADQL queries, I'll take this as a cue for a blog post.

    The concrete background is fairly technical and registry-ish; suffice it to say that some data providers who implemented interfaces conforming to some standard didn't properly say so in their registry records. Back in RegTAP 1.0 (that's the standard that says how a client like TOPCAT talks to the VO Registry), I decided to work around that by fudging the pattern for how to discover those interfaces so they'd still be found.

    In RegTAP 1.1, which is now under review by the VO community, I wanted to do away with that workaround. But would that break anything? This question translates to “are there vs:ParamHTTP interfaces that don't have a role attribute of std”. Whatever “ParamHTTP” and “role attribute” actually mean, just appreciate that it looks like it might translate into SQL like:

    select * from rr.interface
    where
      intf_type='vr:paramhttp'
      and not intf_role='std'
    

    I ran that query, rejoiced because it didn't return anything, removed the workarund from the standard, and then was shot down when I read Mark's mail (politely) saying I'm wrong and there are services still requiring the workaround. As usual: If a query returns what you expect, be double careful.

    What went wrong? Well, NULL semantics. You see, in SQL NULL is never equal to anything, not even itself (it's like NaN in IEEE floats in that: try n = float('nan');print(n==n) in Python and look again if you're cool about it). It's also not unequal. Don't take my word for it. Try:

    select * from tap_schema.schemas where NULL=NULL
    

    and:

    select * from tap_schema.schemas where NULL!=NULL
    

    – you'll get empty results in both cases.

    What does that mean for science queries? Well, whenever there's NULLs in columns (and the only safe assumption for now is that they may hide in there; we should probably add nun-null as a column property in the tap schema and in VODataService some day), you need to be careful in particular with inverted logic.

    Here's an example: Suppose you want to investigate NGC objects brighter than 10 mag in B in one bin in everything else in another. The ones brighter are simple:

    select count(*) from openngc.data where mag_b<10
    

    (try it on the TAP server at http://dc.g-vo.org/tap, it's 383 in the current release). It becomes difficult for “the rest”. If you write:

    select count(*) from openngc.data where not mag_b<10
    

    or, equivalently:

    select count(*) from openngc.data where mag_b>=10
    

    you'll get (for the current release) 10887. However, the whole catalogue has 13954 entries, so there's 13954-10887-383=2684 rows missing. Your “rest” has missed everything for which mag_b isn't given. Sure enough:

    select count(*) from openngc.data where mag_b is null
    

    (and this is the only good way to compare against null) gives 2684.

    The right way to say “anything for which mag_b is not smaller than 10” thus is:

    select count(*) from openngc.data
    where
      not mag_b<10
      or mag_b is null
    

    Morale: Unless you're sure there are no missing values (i.e., NULLs) in a column you're looking at, think about what these mean to your research (or other) question: Should these rows just vanish? Then you usually don't need to do anything and the SQL semantics magically do the right thing (which is why things are defined as they are). If, however, the corresponding rows would mean something to your question, you need to be explicit, and you must have some condition involving IS NULL or IS NOT NULL.

    The trouble, of course, is that just knowing this still isn't enough. You need to remember it in the right moment. Or you'll share my fate of suffering some public embarrassement.

  • APPLAUSE via Obscore

    A composite of two rather noisy photo plates

    Aladin showing some Bamberg Sky Patrol plates (see towards the end of the post for what this is and how I made it).

    At the Astroplate conference I blogged about recently, the people behind APPLAUSE gave a couple of talks about their Data Release 3. APPLAUSE is a fairly massive endeavour to make available data from some of the larger plate archives in Germany, and its DR3 even hit the non-Astronomy press last February.

    Already for previous APPLAUSE releases, I've wanted to bring this data (or rather, its metadata) to the VO, but it never quite happened, basically because there was always another little thing that turned out to be too tedious to work out via mail. However, working out things interactively is exactly what conferences are great for. So, the kind APPLAUSE folks (thanks, Taavi and Harry) and I used the Astroplate to map their database schema (“schema” is jargon for what boils down to the set of tables and columns with which they describe their data) to the much simpler (and, admittedly, less powerful) IVOA Obscore one.

    Sure, Obscore doesn't deal with multiple exposures (like when the target field and the north pole were exposed on one plate to help precision photometry), object-guided images, and all the other interesting techniques that astronomers applied in the pre-digital age; it also doesn't usefully cope with multiple scans of the same plate (for instance, to correct for imprecisions in the mechanics of flatbed scanners). APPLAUSE, of course, has to cope with them, since there are many reasons to preserve data of this kind.

    Obscore, on the other hand, is geared towards uniform discovery, where too funky datasets in all likelihood cause more harm than good. So, when we mapped APPLAUSE to Obscore, of the 101138 scans of 70276 plates that the full APPLAUSE holds in DR3, only 44000 plate scans made it into the Obscore table. The advantage: whatever can be sensibly mapped to Obscore can now be queried together with all the other data in the world that others have published through Obscore.

    You can immediately see the effect when you run the little python program doing the global discovery we gave in our plates tutorial. Here's what it prints now (values from pre-APPLAUSE-in-Obscore are in square brackets):

    Column t_exptime: 3460 values
      Min   12, Max 15300, Mean 890.24  [previous mean: 370.722]
    ---
    Column em_mean: 3801 values
      Min 1.8081e-09, Max 9.3e-07, Mean 6.40804e-07 [No change: Sigh!]
    ---
    Column t_mean: 4731 values
      Min 12564.5, Max 58126.3, Mean 49897.9 [previous mean: 51909.1]
    ---
    Column instrument_name: 4747 values
      Matches from , Petzval, [Max Wolf's residence in
      Heidelberg, Maerzgasse, Wolf's Doppelastrograph,
      Heidelberg Koenigstuhl (24), Wolf's
      Doppelastrograph,] AG-Astrograph, [Zeiss Triplet
      15 cm Potsdam-Telegrafenberg], Zeiss Triplet,
      Astrograph (four 10-cm Tessar f/6 cameras),
      [3.5m APO, ROSAT PSPCC, Heidelberg Koenigstuhl
      (24), Bruce Astrograph, Calar Alto (493),
      Schmidt], Grosser Refraktor, [ROSAT HRI,
      DK-1.54], Hamburger Schmidt-Spiegel,
      [DFOSC_FASU], ESO 1-metre Schmidt telescope,
      Great Schmidt Camera, Lippert-Astrograph, Ross-B
      3", [AZT 22], Astrograph (six 10-cm Tessar f/6
      cameras), 1m-Spiegelteleskop, [ROSAT PSPCB],
      Astrograph (ten 10-cm Tessar f/6 cameras), Zeiss
      Objective
    ---
    Column access_url: 4747 values [4067]
    

    So – for the fields selected in the tutorial, there are 15% more images in the global Obscore image pool now than there were before APPLAUSE, and their mean observation date went a bit farther into the past. I've not made any statistics, but I suspect for many other fields the gain is going to be much higher. For a strong effect, try some random region covered by the Bamberg Sky Patrol on the southern sky.

    But you have probably noticed the deep sigh in the annotations to the statistics above: Yes, we don't have the spectral band for the APPLAUSE data, which is why the stats on em_min doesn't change. As a matter of fact, from the Obscore data you can't even guess whether a plate is “more red” or “rather blue”, as Obscore doesn't have an (agreed-upon) field for “qualititive bandpass indicator”.

    For some other data collections, we did map known emulsion/filter combinations to rough bandpasses (e.g., the Palomar-Leiden Trojan Survey, which only had a few of them). For APPLAUSE, there are 435 combinations of filter and emulsion (that's a VOTable link that you can paste into TOPCAT's load button in order to have a look at the table). Granted, quite a few of these pairs are (more or less) spurious because of inconsistent spelling. But we still gave up on researching the bandpasses even before we started.

    If you're a photographic plate buff: You could help us and posteriority a lot if you could go through this list and at least for some combinations tell us what, roughly, the lower and upper limits of the corresponding bandpasses might have been (what DaCHS already knows, plate-relevant data near the bottom of the file). As usual, send mail to gavo@ari.uni-heidelberg.de if you have anything to contribute.

    Finally, here's the brief explanation of the image for this article: Well, I wanted to find some Bamberg Sky Patrol images for a single field to play with. I knew they were primarily located in the South, and were made using Tessar cameras. So, I ran:

    SELECT t_min, access_url, s_region
    FROM ivoa.obscore
    WHERE instrument_name like '%Tessar%'
    AND 1=CONTAINS(POINT(345, -38), s_region)
    

    on GAVO's TAP service. Since Aladin 10, you can do that from within the program (although some versions will reject this query because they mistakenly believe the ADQL is bad. Query through TOPCAT and send the result over to Aladin if that bites you). Incidentally, when there are s_region values in Obscore tables, it's a good idea to use them as I do here, as it's quite a bit more likely that this query will use indices than some condition on s_ra and s_dec. But then not all services fill s_region properly, so for all-VO queries you will probably want to make do with s_ra and s_dec.

    From that result I first made the inset bar graph in the article image to show the temporal distribution of the Patrol plates. And then I grabbed two (rather randomly selected) plates and had Aladin produce a red-blue composite of them. Whatever is really red or really blue in that image may correspond to a transient event. Or, as certainly the case with that little hair (or whatever) that shines out in blue, it may not.

« Page 4 / 7 »