Posts with the Tag TAP:

  • LAMOST5 meets Datalink

    One of the busiest spectral survey instruments operated right now is the Large Sky Area Multi-Object Fiber Spectrograph Telescope (LAMOST). And its data in the VO, more or less: DR2 and DR3 have been brought into the VO by our Czech colleagues, but since they currently lack resources to update their services to the latest releases, they have kindly given me their DaCHS resource descriptor, and so I had a head start for publishing DR5 in Heidelberg.

    With some minor updates, here it is now: Over nine million medium-resolution spectra covering large parts of the northen sky – the spatial coverage is like this:

    Coverage Healpix map

    There's lots of fun to be had with this; of course, there's an SSA service, so when you point Aladin or Splat at some part of the covered sky and look for spectra, chances are you'll see LAMOST spectra, and when working on some of our tutorials (this one, for example), it happened that LAMOST actually had what I was looking for when writing them.

    But I'd like to use the opportunity to mention two other modes of accessing the data.

    Stacked spectra

    Tablesample and TOPCAT's Plot Table activation action

    Say you'd like to look at spectra of M stars and would like to have some sample from across the sky, fire up TOPCAT, point its TAP client the GAVO DC TAP service (http://dc.g-vo.org/tap) and run something like:

    select
      ssa_pubDID, accref, raj2000, dej2000, ssa_targsubclass
    from lamost5.data tablesample(1)
    where
      ssa_targsubclass like 'M%'
    

    This is using the TABLESAMPLE modifier in the from clause, which isn't standard ADQL yet. As mentioned in the DaCHS 1.4 announcement, DaCHS has a prototype implementation of what's been discussed on the IVOA's DAL mailing list: pick a part of a table rather than the full one. It takes a percentage as an argument, and tells the server to choose about this percentage of the table's records using a reasonable and fast heuristic. Note that this won't give you perfect statistical sampling, but if it's not “good enough” for some purpose, I'd like to learn about that purpose.

    Drawing a proper statistical sample, on the other hand, would take minutes on the GAVO database server – with tablesample, I had the roughly 6000 spectra the above query returns essentially instantaneously, and from eyeballing a sky plot of them, I'd say their distribution is close enough to that of the full DR5. So: tablesample is your friend.

    For a quick look at the spectra themselves, in TOPCAT click Views/Activation Actions, check “Plot Table” and make sure TOPCAT proposes the accref column as “Table Location” (if you don't see these items, update your TOPCAT – it's worth it). Now click on a row or perhaps a dot on a plot and behold an M spectrum.

  • Deredden using TAP

    An animated color-magnitude diagram

    Raw and dereddened CMD for a region in Cygnus.

    Today I published a nice new set of tables on our TAP service: The Bayestar17 3D dust map derived from Pan-STARRS 1 by Greg Green et al. I mention in passing that this was made particularly enjoyable because Greg and friends put an explicit license on their data (in this case, CC-BY-SA).

    This dust map is probably a fascinating resource by itself, but the really nifty thing is that you can use it to correct all kinds of photometric data for extinction – at least to some extent. On the Bayestar web page, the authors give some examples for usage – and with our new service, you can use TAP as well to correct photometry for extinction.

    To see how, first have a look at the table metadata for the prdust.map_union table; this is what casual users probably should look at. More specifically, at the coverage, best_fit, and grdiagnostic columns.

    coverage here is an interval of 10-healpixes. It has to be an interval because the orginal data comes on wildly different levels; depending on the density of stars, sometimes it takes the area of a 6-healpix (about a square degree) to get enough signal, whereas in the galactic plane a 10-healpix (a thousandth of a square degree) already has enough stars. To make the whole thing conveniently queriable without exploding a 6-healpix row into 1000 identical rows, larger healpixes translate into intervals of 10-helpixes. Don't panic, though, I'll show how to conveniently query this below.

    best_fit and grdiagnostic are arrays (remember the light cuves in Gaia DR2?). In bins of 0.5 in distance modulus (which is, in case you feel a bit uncertain as to the algebraic signs, 5 log10(dist)-5 for a distance in parsec), starting with a distance modulus of 4 and ending with 19. This means that for a distance modulus of 4.2 you should check the array index 0, whereas 4.3 already would be covered by array index 1. With this, best_fit[ind] gives E(B-V) = (B-V) - (B-V)0 in the direction of coverage in a distance modulus bin of 2*ind+4. For each best_fit[ind], grdiagnostic[ind] contains a quality measure for that value. You probably shouldn't touch the E(B-V) if that measure is larger than 1.2.

    So, how does one use this?

    To try things, let's pull some Gaia data with distances; in order to have interesting extinctions, I'm using a patch in Cygnus (RA 288.5, Dec 2.3). If you live on the northern hemisphere and step out tonight, you could see dust clouds there with the naked eye (provided electricity fails all around, that is). Full disclosure: I tried the Coal Sack first but after checking the coverage of the dataset – which essentially is the sky north of -30 degrees – I noticed that wouldn't fly. But stories like these are one reason why I'm making such a fuss about having standard STC coverage representations.

    We want distances, and to dodge all the intricacies involved when naively turning parallaxes to distances discussed at length in a paper by Xavier Luri et al (and elsewhere), I'm using precomputed distances from Bailer-Jones et al. (2018AJ....156...58B); you'll find them on the "ARI Gaia" service; in TOPCAT's TAP dialog simply search for “Gaia” – that'll give you the GAVO DC TAP search, too, and that we'll need in a second.

    The pre-computed distances are in the gaiadr2_complements.geometric_distance table, which can be joined to the main Gaia object catalog using the source_id column. So, here's a query to produce a little photometric catalog around our spot in Cygnus (we're discarding objects with excessive parallax errors while we're at it):

    SELECT
    r_est, 5*log10(r_est)-5 as dist_mod,
    phot_g_mean_mag, phot_bp_mean_mag, phot_rp_mean_mag,
    ra, dec
    FROM
    gaiadr2.gaia_source
    JOIN gaiadr2_complements.geometric_distance
    USING (source_id)
    WHERE
    parallax_over_error>1
    AND 1=CONTAINS(POINT('ICRS', ra, dec), CIRCLE('ICRS', 288.5, 2.3, 0.5 ))
    

    The color-magnitude diagram resulting from this is the red point cloud in the animated GIF at the top. To reproduce it, just plot phot_bp_mean_mag-phot_rp_mean_mag against phot_g_mean_mag-dist_mod (and invert the y axis).

    De-reddening this needs a few minor technicalities. The most important one is how to match against the odd intervals of healpixes in the prdust.map_union table. A secondary one is that we have only pulled equatorial coordinates, and the healpixes in prdust are in galactic coordinates.

    Computing the healpix requires the ivo_healpix_index ADQL user defined function (UDF) that you may have met before, and since we have to go from ICRS to Galactic it requires a fairly new UDF I've recently defined to finally get the discussion on having a “standard library” of astrometric functions in ADQL going: gavo_transform. Here's how to get a 10-healpix as required for map_union from ra and dec:

    CAST(ivo_healpix_index(10,
      gavo_transform('ICRS', 'GALACTIC', POINT(ra, dec))) AS INTEGER)
    

    The CAST call is a pure technicality – ivo_healpix_index returns a 64-bit integer, which I can't use in my interval logic.

    The comparison against the intervals you could do yourself, but as argued in Registry-STC article this is one of the trivial things that are easy to get wrong. So, let's use the ivo_interval_overlaps UDF; it goes in the join condition to properly match prdust healpixes to catalog positions. Then our total query – that, I hope, should be reasonably easy to adapt to similar problems – is:

    WITH sources AS (
      SELECT phot_g_mean_mag,
        phot_bp_mean_mag,
        phot_rp_mean_mag,
        dist_mod,
        CAST(ivo_healpix_index(10,
          gavo_transform('ICRS', 'GALACTIC', POINT(ra, dec))) AS INTEGER) AS hpx,
        ROUND((dist_mod-4)*2)+1 AS dist_mod_bin
      FROM TAP_UPLOAD.T1)
    
    SELECT
      phot_bp_mean_mag-phot_rp_mean_mag-dust.best_fit[dist_mod_bin] AS color,
      phot_g_mean_mag-dist_mod+
        dust.best_fit[dist_mod_bin]*3.384 AS abs_mag,
      dust.grdiagnostic[dist_mod_bin] as qual
    FROM sources
    JOIN prdust.map_union AS dust
    ON (1=ivo_interval_has(hpx, coverage))
    

    (If you're following along: you have to switch to the GAVO DC TAP to run this, and you will probably have to change the index after TAP_UPLOAD).

    Ok, in the photometry department there's a bit of cheating going on here – I'm correcting Gaia B-R with B-V, and I'm using the factor for Johnson V to estimate the extinction in Gaia G (if you're curious where that comes from: See the footnote on best_fit and the MC extinction service docs should get you started), so this is far from physically correct. But, as you can see from the green cloud in the plot above, it already helps a bit. And if you find out better factors, by all means let me know so I can add an update... right here:

    Update (2018-09-11): The original data creator, Gregory Green points out that the thing with having a better factor for Gaia G isn't that simple, because, as he says “Gaia G is very broad, [and] the extinction coefficients are much more dependent on stellar type, and extinction is also more nonlinear with dust column (extinction is only linear with dust column and independent of stellar type for an infinitely narrow passband)”. So – when de-reddening, prefer narrow passbands. But whether narrow or wide: TAP helps you.

  • Gaia DR2: A light version and light curves

    screenshot: topcat and matplotlib

    Topcat is doing datalink, and our little python script has plotted a two-color time series of RMC 18 (or so I think).

    If anyone ever writes a history of the VO, the second data release of Gaia on April 25, 2018 will probably mark its coming-of-age – at least if you, like me, consider the Registry the central element of the VO. It was spectacular to view the spike of tens of Registry queries per second right around 12:00 CEST, the moment the various TAP services handing out the data made it public (with great aplomb, of course).

    In GAVO's Data Center we also carry Gaia DR2 data. Our host institute, the Zentrum für Astronomie in Heidelberg, also has a dedicated Gaia server. This gives relieves us from having to be a true mirror of the upstream data release. And since the source catalog has lots and lots of columns that most users will not be using most of the time, we figured a “light” version of the source catalog might fill an interesting ecological niche: Behold gaia.dr2light on the GAVO DC TAP service, containing essentially just the basic astrometric parameters and the diagonal of the covariance matrix.

    That has two advantages: Result sets with SELECT * are a lot less unwieldy (but: just don't do this with Gaia DR2), and, more importantly, a lighter table puts less load on the server. You see, conventional databases read entire rows when processing data, and having just 30% of the columns means we will be 3 times faster on I/O-bound tasks (assuming the same hardware, of course). Hence, and contrary to several other DR2-carrying sites, you can perform full sequential scans before timing out on our TAP service on gaia.dr2light. If, on the other hand, you need to do debugging or full-covariance-matrix error calculations: The full DR2 gaia_source table is available in many places in the VO. Just use the Registry.

    Photometry via TAP

    A piece of Gaia DR2 that's not available in this form anywhere else is the lightcurves; that's per-transit photometry in the G, BP, and RP band for about 0.5 million objects that the reduction system classified as variable. ESAC publishes these through datalink from within their gaia_source table, and what you get back is a VOTable that has the photometry in the three bands interleaved.

    I figured it might be useful if that data were available in a TAP-queriable table with lightcurves in the database. And that's how gaia.dr2epochflux came into being. In there, you have three triples of arrays: the epochs (g_transit_time, bp_obs_time, and rp_obs_time), the fluxes (g_transit_flux, bp_flux, and rp_flux), and their errors (you can probably guess their names). So, to retrieve G lightcurves where available together with a gaia_source query of your liking, you could write something like:

    SELECT g.*, g_transit_time, g_transit_flux
    FROM gaia.dr2light AS g
    LEFT OUTER JOIN gaia.dr2epochflux
    USING (source_id)
    WHERE ...whatever...
    

    – the LEFT OUTER JOIN arranges things such that the g_transit_time and g_transit_flux columns simply are NULL when there are no lightcurves; with a normal (“inner”) join, rows without lightcurves would not be returned in such a query.

    To give you an idea of what you can do with this, suppose you would like to discover new variable blue supergiants in the Gaia data (who knows – you might discover the precursor of the next nearby supernova!). You could start with establishing color cuts and train your favourite machine learning device on light curves of variable blue supergiants. Here's how to get (and, for simplicity, plot) time series of stars classified as blue supergiants by Simbad for which Gaia DR2 lightcurves are available, using pyvo and a little async trick:

    from matplotlib import pyplot as plt
    import pyvo
    
    def main():
      simbad = pyvo.dal.TAPService(
        "http://simbad.u-strasbg.fr:80/simbad/sim-tap")
      gavodc = pyvo.dal.TAPService("http://dc.g-vo.org/tap")
    
      # Get blue supergiants from Simbad
      simjob = simbad.submit_job("""
        select main_id, ra, dec
        from basic
        where otype='BlueSG*'""")
      simjob.run()
    
      # Get lightcurves from Gaia
      try:
        simjob.wait()
        time_series = gavodc.run_sync("""
          SELECT b.*, bp_obs_time, bp_flux, rp_obs_time, rp_flux
          FROM (SELECT
             main_id, source_id, g.ra, g.dec
             FROM
            gaia.dr2light as g
             JOIN TAP_UPLOAD.t1 AS tc
             ON (0.002>DISTANCE(tc.ra, tc.dec, g.ra, g.dec))
          OFFSET 0) AS b
          JOIN gaia.dr2epochflux
          USING (source_id)
          """,
          uploads={"t1": simjob.result_uri})
      finally:
        simjob.delete()
    
      # Now plot one after the other
      for row in time_series.table:
        plt.plot(row["bp_obs_time"], row["bp_flux"])
        plt.plot(row["rp_obs_time"], row["rp_flux"])
        plt.show(block=False)
        raw_input("{}; press return for next...".format(row["main_id"]))
        plt.cla()
    
    if __name__=="__main__":
      main()
    

    If you bother to read the code, you'll notice that we transfer the Simbad result directly to the GAVO data center without first downloading it. That's fairly boring in this case, where the table is small. But if you have a narrow pipe for one reason or another and some 105 rows, passing around async result URLs is a useful trick.

    In this particular case the whole thing returns just four stars, so perhaps that's not a terribly useful target for your learning machine. But this piece of code should get you started to where there's more data.

    You should read the column descriptions and footnotes in the query results (or from the reference URL) – this tells you how to interpret the times and how to make magnitudes from the fluxes if you must. You probably can't hear it any more, but just in case: If you can, process fluxes rather than magnitudes from Gaia, because the errors are painful to interpret in magnitudes when the fluxes are small (try it!).

    Note how the photometry data is stored in arrays in the database, and that VOTables can just transport these. The bad news is that support for manipulating arrays in ADQL is pretty much zero at this point; this means that, when you have trained your ML device, you'll probably have to still download lots and lots of light curves rather than write some elegant ADQL to do the filtering server-side. However, I'd be highly interested to work out how some tastefully chosen user defined functions might enable offloading at least a good deal of that analysis to the database. So – if you know what you'd like to do, by all means let me know. Perhaps there's something I can do for you.

    Incidentally, I'll talk a bit more about ADQL arrays in a blog post coming up in a few weeks (I think). Don't miss it, subscribe to our feed).

    SSAP and Obscore

    If you're fed up with bleeding-edge tech, the light curves are also available through good old SSAP and Obscore. To use that, just get Splat (or another SSA client, preferably with a bit of time series support). Look for a Gaia DR2 time series service (you may have to update the service list before you find it), enter (in keeping with our LBV theme) S Dor as position and hit “Lookup” followed by “Send Query”. Just click on any result to just view the time series – and then apply Splat's rich tool set to it.

    Update (8.5.2018): Clusters

    Here's another quick application – how about looking for variable stars in clusters? This piece of ADQL should get you started:

    SELECT TOP 100
      source_id, ra, dec, parallax, g.pmra, g.pmdec,
      m.name, m.pmra AS c_pmra, m.pmde AS c_pmde,
      m.e_pm AS c_e_pm,
      1/dist AS cluster_parallax
    FROM
      gaia.dr2epochflux
      JOIN gaia.dr2light AS g USING (source_id)
      JOIN mwsc.main AS m
      ON (1=CONTAINS(
        POINT(g.ra, g.dec),
        CIRCLE(m.raj2000, m.dej2000, rcluster)))
    WHERE IN_UNIT(pmdec, 'deg/yr') BETWEEN m.pmde-m.e_pm*3 AND m.pmde+m.e_pm*3
    

    – yes, you'll want to constrain pmra, too, and the distance, and properly deal with error and all. But you get simple lightcurves for free. Just add them in the SELECT clause!

  • DaCHS 1.1 released

    Today, I have released DaCHS 1.1, with the main selling point that DaCHS should now speak TAP 1.1 (as defined in the current draft).

    First off, if you're not yet on DaCHS 1.0, please read the corresponding release article before upgrading.

    As usual, the general upgrading instructions are available in the operator's guide (in short: do a dachs val ALL before the Debian upgrade). This time, I'd recommend to use the opportunity to upgrade your underlying server to stretch if you haven't done so already. If you do that, please have a look at hints on postgres upgrades. Stretch comes with postgres 9.6 (jessie: 9.4). Postgres upgrades are generally safe, but please take a dump before migrating anyway.

    So, with this out of the way, here's a short list of the major changes from DaCHS 1.0 to DaCHS 1.1:

    • DaCHS now officially requires python 2.7. If this really is a problem for you, please shout – if wouldn't be hard to maintain 2.6 compatibility, but by now we feel there's no reason to bother any more.
    • Now supporting TAP 1.1; in particular, TOP n doesn't trump MAXREC any more, and it doesn't affect OVERFLOW indication, which may break things that used TOP to override DaCHS' default TAP match limit of 2000. Also, TAP_SCHEMA is updated (this happens as a side effect of dachs upgrade).
    • Now serialising spoint, scircle, and friends to DALI 1.1 xtypes (timestamp, point, polygon, circle). Fields explicitly marked with adql:POINT or adql:REGION will still be serialised to STC-S. Do this only if you have no choice (DaCHS has this for obscore and epntap s_region right now).
    • The output column selection is sanitised. This may make for slight changes in service responses, in particular in VOTable formats. See Output Tables in the reference documentation for details if you think this might hit you.
    • DaCHS no longer comes with an outdated version pyparsing and instead uses what's installed on the system. The Debian package further re-uses additional system resources if available (rjsmin, jquery).
    • DaCHS now tries a bit harder to come up with sensible names for SODA result files.
    • map/@source is no longer limited to identifier-like strings; any key that's in your source is fair game.
    • For incremental imports with data that's updated now and then, there's now ignoreSources/@fromdbUpdating.
    • Relative imports from custom code ("import foo" in a custom core, for instance, getting res/foo.py) no longer work. See Importing Modules in the reference documentation for details.
    • This release fixes a severe bug in the creation of obscore metadata from SSAP tables. If you use //obscore#publishSSAPHCD or //obscore#publishSSAPMIXC mixins, update the obscore definitions by running dachs imp -m <rdid>, followed by dachs imp //obscore (the latter is only necessary once at the end).
    • You can now define a footer.html template that's added at the foot of the main page content – with a bit of CSS magic, this lets you overwrite almost anything on DaCHS HTML pages.

    As always, please complain early if something breaks for you; our regression tests can only cover so much. In particular, our support list is there for you.

    Update (2017-12-06): In particular on jessie, you may see that all DaCHS packages are being held back. To resolve this situation, manually say apt-get install python-gavoutils python-gavostc.

  • Updated Proper Motion Tutorial

    At the risk of turning this into a blog on nice TAP tricks (which it's not supposed to be): Our classic short tutorial on adding proper motions to almost arbitrary object lists has just gotten a facelift today.

    And there's new content, too – I now show what to do when you don't even have positions but just object names. In order to keep this sufficiently geeky, here's the query as a spoiler:

    SELECT col1, ra, dec
    FROM TAP_UPLOAD.t1
    LEFT OUTER JOIN ident
    ON (id=normId(col1))
    LEFT OUTER JOIN basic
    ON (oidref=oid)
    

    But to close on a non-TAP topic: Registry! There's an experimental facility to have this kind of thing in the Registry; the PM tutorial is in, for instance, with the ivoid ivo://edu.gavo.org/hd/gavo_addpms). One thing you can do with this is generate a list of registred documents that essentially updates itself from the registry.

    Another is figure out where the source code of the document is (if the authors choose to share it, which is of course a very smart thing to do); in our example it's in Volute, the IVOA's semi-official version control system. So, if you find a bug (defined as “superset of typo”) in the linked document, you're most welcome to supply patches as diffs or just directly fix things if you have commit privileges in Volute.

« Page 2 / 3 »