LAMOST5 meets Datalink

One of the busiest spectral survey instruments operated right now is the Large Sky Area Multi-Object Fiber Spectrograph Telescope (LAMOST). And its data in the VO, more or less: DR2 and DR3 have been brought into the VO by our Czech colleagues, but since they currently lack resources to update their services to the latest releases, they have kindly given me their DaCHS resource descriptor, and so I had a head start for publishing DR5 in Heidelberg.

With some minor updates, here it is now: Over nine million medium-resolution spectra covering large parts of the northen sky – the spatial coverage is like this:

[Coverage Healpix map]

There’s lots of fun to be had with this; of course, there’s an SSA service, so when you point Aladin or Splat at some part of the covered sky and look for spectra, chances are you’ll see LAMOST spectra, and when working on some of our tutorials (this one, for example), it happened that LAMOST actually had what I was looking for when writing them.

But I’d like to use the opportunity to mention two other modes of accessing the data.

Tablesample and TOPCAT’s Plot Table activation action

Say you’d like to look at spectra of M stars and would like to have some sample from across the sky, fire up TOPCAT, point its TAP client the GAVO DC TAP service (http://dc.g-vo.org/tap) and run something like

select 
  ssa_pubDID, accref, raj2000, dej2000, ssa_targsubclass
from lamost5.data tablesample(1)
where 
  ssa_targsubclass like 'M%' 

Image: stacked spectra

This is using the TABLESAMPLE modifier in the from clause, which isn’t standard ADQL yet. As mentioned in the DaCHS 1.4 announcement, DaCHS has a prototype implementation of what’s been discussed on the IVOA’s DAL mailing list: pick a part of a table rather than the full one. It takes a percentage as an argument, and tells the server to choose about this percentage of the table’s records using a reasonable and fast heuristic. Note that this won’t give you perfect statistical sampling, but if it’s not “good enough” for some purpose, I’d like to learn about that purpose.

Drawing a proper statistical sample, on the other hand, would take minutes on the GAVO database server – with tablesample, I had the roughly 6000 spectra the above query returns essentially instantaneously, and from eyeballing a sky plot of them, I’d say their distribution is close enough to that of the full DR5. So: tablesample is your friend.

For a quick look at the spectra themselves, in TOPCAT click Views/Activation Actions, check “Plot Table” and make sure TOPCAT proposes the accref column as “Table Location” (if you don’t see these items, update your TOPCAT – it’s worth it). Now click on a row or perhaps a dot on a plot and behold an M spectrum.

Cutouts via Datalink

LAMOST releases spectra in FITS format pretty much like the ones you may know from SDSS. The trick above works because we instead hand out proper, IVOA Spectral Data Model-compliant spectra through SSA and TAP. However, if you need to go back to the original files, you can, using Datalink. If you’re unsure what this Datalink thing is: call me vain, but I still like my 2015 ADASS poster explaining that. In TOPCAT, you’d be using the “Invoke Service” activation action to get to the datalinks.

If you have actual work to do, offloading repetetive work to the computer is what you want, and fortunately, pyVO knows about datalink, too. I give you this is hard to discover so far, and the interface is… a tiny bit clunky. Until some kind soul cleans up the pyVO datalink act, a poster Stefan and I showed at the 2017 ADASS might give you an idea which buttons to press. Or read on and see how things work for LAMOST5.

The shortest way to datalinks is a TAP query that at least retrieves the ssa_pubdid column (that’s a must; Datalink can’t work without it) and, on the result, run the iter_datalinks method. This returns an object in which you can find the associated data items (in this case, a preview and the original FITS with the #progenitor semantics), plus the cutout service.

Hence, a minimal example for pulling the legacy FITS links out of the first three items in lamost5.data would look like this:

import pyvo

svc = pyvo.dal.TAPService("http://dc.g-vo.org/tap")
for dl in svc.run_sync("select top 3 ssa_pubdid"
        " from lamost5.data").iter_datalinks():
    print(next(dl.bysemantics("#progenitor")
        )["access_url"].decode("ascii"))

This is a bit different from listing 2 in the poster linked above because it’s python3, so getting the first element from iterator an iterator looks a bit different, and (curse astropy.votable for returning VOTable chars as bytes rather than strings!) you’ll want to turn the URL into a proper string manually.

Another, actually more interesting, thing you can do with Datalink is cut out regions of interest. The LAMOST spectra are fairly long (though of course still small by image standards), so if you’re only interested in a single line, you can save a bit of storage and bandwidth over blindly pulling the whole thing.

For instance, if you wanted to pull the vicinity of the H and K Fraunhofer lines from the matches in the loop in the snippet above, you could say:

from astropy import units as u
proc = next(dl.iter_procs())
cutout = proc.processed(band=(392*u.nm,398*u.nm))

And this is what I’ve done for the decorative left border above: it’s the H and K line profiles for 0.1% of the stars LAMOST has classified as G8. Building the image didn’t take more than a few seconds (where I’d like the cutouts to be faster by a factor of 10; I guess that’s about an afternoon of work for me, so if it’d save you more than that afternoon, poke me to do it).

What’s coming back is tables. By the time python has digested these, they’re numpy record arrays. Thus, you can immediately bring in your beloved scipy (or whatever). For instance, if for some reason you’re convinced that the H and K lines should be fit by identical Gaussians in the boring case and would like find objects for which that’s patently untrue and that hence could be un-boring, here’s how you could do that:

def spectral_model(wl, c1, c2, depth, width):
    return (1
        -depth*numpy.exp(-numpy.square(wl-c1)
            /numpy.square(width))
        -depth*numpy.exp(-numpy.square(wl-c2)
            /numpy.square(width)))

for pubdid, prof in get_profiles(
        "G8", (392*u.nm,398*u.nm), 0.01, 4):
    prof["flux"] /= max(prof["flux"])
    popt, pcov = curve_fit(
        spectral_model, prof["spectral"], prof["flux"],
        sigma=prof["flux_error"],
        p0=[3968, 3934, 1, 1])
    if pcov[3][3]>1:
        break

– where get_profiles is essentially doing the TAP plus datalink routine above, except I’m swallowing spectra with too much noise and I have the function transform the spectral coordinate into the objects’ rest frames. If you’re curious how I’m doing this just based on the IVOA Spectral Data Model, check the source linked at the bottom of this post.

I’ve just run this, and the first spectrum that the machinery flagged as suspicious was this:

Image: A fairly boring late G spectrum

– which doesn’t look like I’ve made a discovery just yet. But that doesn’t mean there’s not a lot to find within LAMOST5’s lines…

To get you up to speed quickly: here’s the actual python3 code I ran for the “analysis” and the plot.

DaCHS 1.2 is out

Today, I have released DaCHS 1.2 – somewhat belatedly perhaps, because I managed to break my collarbone, but here it is. If you’ve been following this blog, you already know about the headline news: the dachs start command, ADQL 2.1, and early support for STC in the registry.

If you’re not yet on DaCHS 1.1, please have a quick look at the corresponding release article. While the upgrade itself should work fine in one go even from older versions, the release notes of course apply cumulatively, and you may still have to do the dist-upgrade to 1.1.

As usual, the generic upgrading instructions are available in the operator’s guide (in short: do a dachs val ALL; apt update; apt upgrade). Since I’ve still encountered DaCHS installations with wrong sources.lists last April: Note again that our repository names have changed in August 2016 – we now have release and beta rather than Debian release names. So, make sure you have something like

deb http://vo.ari.uni-heidelberg.de/debian release main

in your /etc/apt/sources.list, not something containing “stable” or the like.

That said, here’s the commented changes for 1.2:

  • New dachs start command to produce structured templates for certain service types. See Horror Vacui Begone on this blog for the full story.
  • Support for ADQL 2.1 (actually, its current proposed recommendation), including almost all of the optional parts (see Speak out on ADQL 2.1 on this blog). While not strictly necessary, it’s a good idea to run dachs imp //adql after the upgrade; this will give you some nice new UDFs, in particular gavo_histogram.
  • New coverage element (with updaters) to build and declare the space-time-spectral coverage of a resource. It would be great if you could add coverage elements to your resources where it makes sense and re-publish them. This blog post tells you how to do it (you’ll have to scroll down a bit).
  • There is now odbcGrammar to feed an import from another database. Essentially, you put an ODBC connection string into a file, point your sources element there, and you’ll get one rawdict per tuple in a foreign database table. This might be a nice way to publish moderate-size non-postgres tables via DaCHS.
  • You can now declare associated datalink services for tables using the _associatedDatalinkSvc meta item. In particular, if you had a datalink property on SSAP services, you should migrate at some point. One advantage: Users will get the datalinks even when querying the tables through TAP. See “Integrating Datalink Services” in the reference documentation for the full story.
  • We now force matplotlib to read its configuration from /var/gavo/etc/matplotlibrc; to get a default, just run dachs init again. This is mainly to avoid uncontrolled imports of matplotlibrcs when DaCHS is run under a uid that does other things now and then.
  • DaCHS now supports VOSI 1.1; in particular, DaCHS now understands the detail hints and has per-table endpoints, so clients like TOPCAT could avoid reading the full table metadata in one go. Realistically, at least TOPCAT doesn’t yet, so this is perhaps less cool than it may sound.
  • The indices generated by the ssa mixins are now a bit more sensible considering typical query modes. You probably want to run dachs imp -I on the RDs for your ssap data collections when convenient. If you have larger spectral collections, chances are many queries will be a lot faster.
  • ssapCore no longer wantonly adds preview columns. If you have previews with spectra, you probably want to add <property name="previews">auto</property> to your ssapCores. If you don’t, the preview column will not be added to SSA responses (right now, few clients evaluate it, but that will hopefully change in the future).
  • You can now add a statisticsTarget property to columns; you will want this on largish tables with non-uniformly distributed values to aid the query planner; something like <property key=" statisticsTarget">10000</property> within the corresponding column element can go a long way to improve query planning (you need to run gavo imp -m on the RD after the change).
  • DaCHS’s log now by default does not contain IP addresses, user agents, and referrers any more, which should mostly keep you from processing personal data and thus from having to muck around with the EU GDPR. To get back the previous behaviour, set [web]logFormat in /etc/gavo.rc to combined.
  • I fixed some utypes for obscore 1.1. These utypes are useless, so there’s nothing you have to do. But then stilts taplint complains about them, and so you may want to run dachs imp -m //obscore.
  • As usual, there are many minor bug fixes and improvements (e.g., memmapping FITSes for cutout again, delimited table references in ADQL, new-style tutorial resource records, correct obscore standardId, much saner nD-arrays in VOTables).

Well – enjoy the release, and if something goes wrong with it, be sure to let us know, preferably on the DaCHS-suppport mailing list.