Posts with the Tag TAP:

  • Gaia DR2: A light version and light curves

    screenshot: topcat and matplotlib

    Topcat is doing datalink, and our little python script has plotted a two-color time series of RMC 18 (or so I think).

    If anyone ever writes a history of the VO, the second data release of Gaia on April 25, 2018 will probably mark its coming-of-age – at least if you, like me, consider the Registry the central element of the VO. It was spectacular to view the spike of tens of Registry queries per second right around 12:00 CEST, the moment the various TAP services handing out the data made it public (with great aplomb, of course).

    In GAVO's Data Center we also carry Gaia DR2 data. Our host institute, the Zentrum für Astronomie in Heidelberg, also has a dedicated Gaia server. This gives relieves us from having to be a true mirror of the upstream data release. And since the source catalog has lots and lots of columns that most users will not be using most of the time, we figured a “light” version of the source catalog might fill an interesting ecological niche: Behold gaia.dr2light on the GAVO DC TAP service, containing essentially just the basic astrometric parameters and the diagonal of the covariance matrix.

    That has two advantages: Result sets with SELECT * are a lot less unwieldy (but: just don't do this with Gaia DR2), and, more importantly, a lighter table puts less load on the server. You see, conventional databases read entire rows when processing data, and having just 30% of the columns means we will be 3 times faster on I/O-bound tasks (assuming the same hardware, of course). Hence, and contrary to several other DR2-carrying sites, you can perform full sequential scans before timing out on our TAP service on gaia.dr2light. If, on the other hand, you need to do debugging or full-covariance-matrix error calculations: The full DR2 gaia_source table is available in many places in the VO. Just use the Registry.

    Photometry via TAP

    A piece of Gaia DR2 that's not available in this form anywhere else is the lightcurves; that's per-transit photometry in the G, BP, and RP band for about 0.5 million objects that the reduction system classified as variable. ESAC publishes these through datalink from within their gaia_source table, and what you get back is a VOTable that has the photometry in the three bands interleaved.

    I figured it might be useful if that data were available in a TAP-queriable table with lightcurves in the database. And that's how gaia.dr2epochflux came into being. In there, you have three triples of arrays: the epochs (g_transit_time, bp_obs_time, and rp_obs_time), the fluxes (g_transit_flux, bp_flux, and rp_flux), and their errors (you can probably guess their names). So, to retrieve G lightcurves where available together with a gaia_source query of your liking, you could write something like:

    SELECT g.*, g_transit_time, g_transit_flux
    FROM gaia.dr2light AS g
    LEFT OUTER JOIN gaia.dr2epochflux
    USING (source_id)
    WHERE ...whatever...
    

    – the LEFT OUTER JOIN arranges things such that the g_transit_time and g_transit_flux columns simply are NULL when there are no lightcurves; with a normal (“inner”) join, rows without lightcurves would not be returned in such a query.

    To give you an idea of what you can do with this, suppose you would like to discover new variable blue supergiants in the Gaia data (who knows – you might discover the precursor of the next nearby supernova!). You could start with establishing color cuts and train your favourite machine learning device on light curves of variable blue supergiants. Here's how to get (and, for simplicity, plot) time series of stars classified as blue supergiants by Simbad for which Gaia DR2 lightcurves are available, using pyvo and a little async trick:

    from matplotlib import pyplot as plt
    import pyvo
    
    def main():
      simbad = pyvo.dal.TAPService(
        "http://simbad.u-strasbg.fr:80/simbad/sim-tap")
      gavodc = pyvo.dal.TAPService("http://dc.g-vo.org/tap")
    
      # Get blue supergiants from Simbad
      simjob = simbad.submit_job("""
        select main_id, ra, dec
        from basic
        where otype='BlueSG*'""")
      simjob.run()
    
      # Get lightcurves from Gaia
      try:
        simjob.wait()
        time_series = gavodc.run_sync("""
          SELECT b.*, bp_obs_time, bp_flux, rp_obs_time, rp_flux
          FROM (SELECT
             main_id, source_id, g.ra, g.dec
             FROM
            gaia.dr2light as g
             JOIN TAP_UPLOAD.t1 AS tc
             ON (0.002>DISTANCE(tc.ra, tc.dec, g.ra, g.dec))
          OFFSET 0) AS b
          JOIN gaia.dr2epochflux
          USING (source_id)
          """,
          uploads={"t1": simjob.result_uri})
      finally:
        simjob.delete()
    
      # Now plot one after the other
      for row in time_series.table:
        plt.plot(row["bp_obs_time"], row["bp_flux"])
        plt.plot(row["rp_obs_time"], row["rp_flux"])
        plt.show(block=False)
        raw_input("{}; press return for next...".format(row["main_id"]))
        plt.cla()
    
    if __name__=="__main__":
      main()
    

    If you bother to read the code, you'll notice that we transfer the Simbad result directly to the GAVO data center without first downloading it. That's fairly boring in this case, where the table is small. But if you have a narrow pipe for one reason or another and some 105 rows, passing around async result URLs is a useful trick.

    In this particular case the whole thing returns just four stars, so perhaps that's not a terribly useful target for your learning machine. But this piece of code should get you started to where there's more data.

    You should read the column descriptions and footnotes in the query results (or from the reference URL) – this tells you how to interpret the times and how to make magnitudes from the fluxes if you must. You probably can't hear it any more, but just in case: If you can, process fluxes rather than magnitudes from Gaia, because the errors are painful to interpret in magnitudes when the fluxes are small (try it!).

    Note how the photometry data is stored in arrays in the database, and that VOTables can just transport these. The bad news is that support for manipulating arrays in ADQL is pretty much zero at this point; this means that, when you have trained your ML device, you'll probably have to still download lots and lots of light curves rather than write some elegant ADQL to do the filtering server-side. However, I'd be highly interested to work out how some tastefully chosen user defined functions might enable offloading at least a good deal of that analysis to the database. So – if you know what you'd like to do, by all means let me know. Perhaps there's something I can do for you.

    Incidentally, I'll talk a bit more about ADQL arrays in a blog post coming up in a few weeks (I think). Don't miss it, subscribe to our feed).

    SSAP and Obscore

    If you're fed up with bleeding-edge tech, the light curves are also available through good old SSAP and Obscore. To use that, just get Splat (or another SSA client, preferably with a bit of time series support). Look for a Gaia DR2 time series service (you may have to update the service list before you find it), enter (in keeping with our LBV theme) S Dor as position and hit “Lookup” followed by “Send Query”. Just click on any result to just view the time series – and then apply Splat's rich tool set to it.

    Update (8.5.2018): Clusters

    Here's another quick application – how about looking for variable stars in clusters? This piece of ADQL should get you started:

    SELECT TOP 100
      source_id, ra, dec, parallax, g.pmra, g.pmdec,
      m.name, m.pmra AS c_pmra, m.pmde AS c_pmde,
      m.e_pm AS c_e_pm,
      1/dist AS cluster_parallax
    FROM
      gaia.dr2epochflux
      JOIN gaia.dr2light AS g USING (source_id)
      JOIN mwsc.main AS m
      ON (1=CONTAINS(
        POINT(g.ra, g.dec),
        CIRCLE(m.raj2000, m.dej2000, rcluster)))
    WHERE IN_UNIT(pmdec, 'deg/yr') BETWEEN m.pmde-m.e_pm*3 AND m.pmde+m.e_pm*3
    

    – yes, you'll want to constrain pmra, too, and the distance, and properly deal with error and all. But you get simple lightcurves for free. Just add them in the SELECT clause!

  • DaCHS 1.1 released

    Today, I have released DaCHS 1.1, with the main selling point that DaCHS should now speak TAP 1.1 (as defined in the current draft).

    First off, if you're not yet on DaCHS 1.0, please read the corresponding release article before upgrading.

    As usual, the general upgrading instructions are available in the operator's guide (in short: do a dachs val ALL before the Debian upgrade). This time, I'd recommend to use the opportunity to upgrade your underlying server to stretch if you haven't done so already. If you do that, please have a look at hints on postgres upgrades. Stretch comes with postgres 9.6 (jessie: 9.4). Postgres upgrades are generally safe, but please take a dump before migrating anyway.

    So, with this out of the way, here's a short list of the major changes from DaCHS 1.0 to DaCHS 1.1:

    • DaCHS now officially requires python 2.7. If this really is a problem for you, please shout – if wouldn't be hard to maintain 2.6 compatibility, but by now we feel there's no reason to bother any more.
    • Now supporting TAP 1.1; in particular, TOP n doesn't trump MAXREC any more, and it doesn't affect OVERFLOW indication, which may break things that used TOP to override DaCHS' default TAP match limit of 2000. Also, TAP_SCHEMA is updated (this happens as a side effect of dachs upgrade).
    • Now serialising spoint, scircle, and friends to DALI 1.1 xtypes (timestamp, point, polygon, circle). Fields explicitly marked with adql:POINT or adql:REGION will still be serialised to STC-S. Do this only if you have no choice (DaCHS has this for obscore and epntap s_region right now).
    • The output column selection is sanitised. This may make for slight changes in service responses, in particular in VOTable formats. See Output Tables in the reference documentation for details if you think this might hit you.
    • DaCHS no longer comes with an outdated version pyparsing and instead uses what's installed on the system. The Debian package further re-uses additional system resources if available (rjsmin, jquery).
    • DaCHS now tries a bit harder to come up with sensible names for SODA result files.
    • map/@source is no longer limited to identifier-like strings; any key that's in your source is fair game.
    • For incremental imports with data that's updated now and then, there's now ignoreSources/@fromdbUpdating.
    • Relative imports from custom code ("import foo" in a custom core, for instance, getting res/foo.py) no longer work. See Importing Modules in the reference documentation for details.
    • This release fixes a severe bug in the creation of obscore metadata from SSAP tables. If you use //obscore#publishSSAPHCD or //obscore#publishSSAPMIXC mixins, update the obscore definitions by running dachs imp -m <rdid>, followed by dachs imp //obscore (the latter is only necessary once at the end).
    • You can now define a footer.html template that's added at the foot of the main page content – with a bit of CSS magic, this lets you overwrite almost anything on DaCHS HTML pages.

    As always, please complain early if something breaks for you; our regression tests can only cover so much. In particular, our support list is there for you.

    Update (2017-12-06): In particular on jessie, you may see that all DaCHS packages are being held back. To resolve this situation, manually say apt-get install python-gavoutils python-gavostc.

  • Updated Proper Motion Tutorial

    At the risk of turning this into a blog on nice TAP tricks (which it's not supposed to be): Our classic short tutorial on adding proper motions to almost arbitrary object lists has just gotten a facelift today.

    And there's new content, too – I now show what to do when you don't even have positions but just object names. In order to keep this sufficiently geeky, here's the query as a spoiler:

    SELECT col1, ra, dec
    FROM TAP_UPLOAD.t1
    LEFT OUTER JOIN ident
    ON (id=normId(col1))
    LEFT OUTER JOIN basic
    ON (oidref=oid)
    

    But to close on a non-TAP topic: Registry! There's an experimental facility to have this kind of thing in the Registry; the PM tutorial is in, for instance, with the ivoid ivo://edu.gavo.org/hd/gavo_addpms). One thing you can do with this is generate a list of registred documents that essentially updates itself from the registry.

    Another is figure out where the source code of the document is (if the authors choose to share it, which is of course a very smart thing to do); in our example it's in Volute, the IVOA's semi-official version control system. So, if you find a bug (defined as “superset of typo”) in the linked document, you're most welcome to supply patches as diffs or just directly fix things if you have commit privileges in Volute.

  • Automating TAP queries

    TOPCAT is a great tool – in particular, for prototyping and ad-hoc analyses, it's hard to beat. But say you've come up with this great TAP query, and you want to preserve it, perhaps generate plots, then maybe fix them when you (or referees) have spotted problems.

    Then it's time to get acquainted with TOPCAT's command line sister STILTS. This tool lets you do most of what TOPCAT can, but without user intervention. Since the command lines usually take a bit of twiddling I usually wrap stilts calls with GNU make, so I just need to run something like make fig3.pdf or make fig3.png and make figures out what to do, potentially starting with a query. Call it workflow preservation if you want.

    How does it work? Well, of course with a makefile. In that, you'll first want to define some variables that allow more concise rules later. Here's how I usually start one:

    STILTS?=stilts
    
    # VOTables are the results of remote queries, so don't wantonly throw
    # them away
    .PRECIOUS: %.vot
    
    # in this particular application, it helps to have this global
    HEALPIX_ORDER=6
    
    # A macro that contains common stuff for stilts TAP query -- essentially,
    # just add adql=
    TAPQUERY=$(STILTS) tapquery \
      tapurl='http://dc.g-vo.org/tap' \
      maxrec=200000000 \
      omode=out \
      out=$@ \
      ofmt=vot \
      executionduration=14400
    
    # A sample plot macro.  Here, we do a healpix plot of some order. Also
    # add value_1=<column to plot>
    HEALPIXPLOT=$(STILTS) plot2sky \
      auxmap=inferno \
      auxlabel='$*'\
      auxvisible=true \
      legend=false \
      omode=out \
      out=$@ \
      projection=aitoff \
      sex=false \
      layer_1=healpix \
        datalevel_1=$(HEALPIX_ORDER) \
        datasys_1=equatorial \
        viewsys_1=equatorial \
        degrade_1=0 \
        transparency_1=0 \
        healpix_1=hpx \
        in_1=$< \
        ifmt_1=votable \
        istream_1=true \
    

    For the somewhat scary STILS command lines, see the STILTS documentation or just use your intution (which mostly should give you the right idea what something is for).

    The next step is to define some pattern rules; these are a (in the syntax here, GNU) make feature that let you say „to make a file matching the destination pattern when you have one with the source pattern, use the following commands”. You can use a number of variables in the rules, in particular $@ (the current target) and $< (the first prerequisite). These are already used in the definitions of TAPQUERY and HEALPIXPLOT above, so they don't necessarily turn up here:

    # healpix plots from VOTables; these will plot obs
    %.png: %.vot
          $(HEALPIXPLOT) \
                  value_1=obs \
                  ofmt=png \
                  xpix=600 ypix=380
    
     %.pdf: %.vot
           $(HEALPIXPLOT) \
                   value_1=obs \
                   ofmt=pdf
    
     # turn SQL statements into VOTables using TAP
     %.vot: %.sql
           $(TAPQUERY) \
                  adql="`cat $<`"
    

    Careful with cut and paste: The leading whitespace here must be a Tab in rules, not just some blanks (this is probably the single most annoying feature of make. You'll get used to it.)

    What can you do with it? Well, for instance you can write an ADQL query into a file density.sql; say:

    SELECT
      count(*) as obs,
      -- "6" in the next line must match HEALPIX_ORDER in the Makefile
      ivo_healpix_index (6, alphaFloat, deltaFloat) AS hpx
    FROM ppmx.data
    GROUP BY hpx
    

    And with this, you can say:

    make density.pdf
    

    and get a nice PDF with the plot resulting from that query. Had you just said make density.vot, make would just have executed the query and left the VOTable, e.g., for investigation with TOPCAT, and if you were to type make density.png, you'd get a nice PNG without querying the service again. Like this:

    A density plot

    Unless of course you changed the SQL in the meantime, in which case make would figure out it had to go back to the service.

    In particular for the plots you'll often have to override the defaults. Make is smart enough to figure this out. For instance, you could have two files like this:

    $ cat pm_histogram.sql
    SELECT
      round(pmtot/10)*10 as bin, count(*) as n
    FROM (
      SELECT sqrt(pmra*pmra+pmde*pmde)*3.6e6 as pmtot
      FROM hsoy.main) AS q
    group by bin
    $ cat pm_histogram_cleaned.vot
    SELECT
      round(pmtot/10)*10 as bin,
      count(*) as n
      FROM (
        SELECT sqrt(pmra*pmra+pmde*pmde)*3.6e6 as pmtot
        FROM hsoy.main
        WHERE no_sc IS NULL) AS q
      group by bin
    

    (these were used to analyse the overall proper motions distributions in HSOY properties; note that each of these will run about 30 minutes or so, so better adapt them to what's actually interesting to you before trying this).

    No special handling in terms of queries is necessary for these, but the plot needs to be hand-crafted:

    pm_histograms.png: pm_histogram.vot pm_histogram_cleaned.vot
         $(STILTS) plot2plane legend=false \
                 omode=out ofmt=png out=$@ \
                  title="All-sky" \
                  xpix=800 ypix=600 \
                  ylog=True xlog=True\
                  xlabel="PM bin [mas/yr, bin size=10]" \
                  xmax=4000 \
                  layer1=mark \
                         color1=blue \
                         in1=pm_histogram.vot \
                         x1=bin \
                         y1=n \
                 layer2=mark \
                         in2=pm_histogram_cleaned.vot \
                         x2=bin \
                         y2=n
    

    – that way, even if you go back to the stuff six months later, you can still figure out what you queried (the files are still there) and what you did then.

    A makefile to play with (and safe from cut-and-paste problems) is available from Makefile_tapsample (rename to Makefile to reproduce the examples).

  • PPMXL+Gaia DR1=HSOY in the Heidelberg Data Center

    The stunning precision of Gaia's astrometry is already apparent in the first release of the data obtained by the satellite, available since last September. However, apart from the small TGAS subset (objects already observed by the 90ies HIPPARCOS astrometry satellite) there is no information on the objects' proper motions in DR1.

    Until Gaia-quality proper motions will become available in DR2, the HSOY catalog – described in Altmann et al's paper Hot Stuff for One Year (HSOY) freshly up in arXiv and online at http://dc.g-vo.org/hsoy – can help if you can live with somewhat lesser-quality kinematics.

    It derives proper motions for roughly half a billion stars from PPMXL and Gaia DR1, which already gives an unprecedented source for 4D astrometry around J2015. And you can start working with it right now. The catalog is available in GAVO's Heidelberg data center (TAP access URL: http://dc.g-vo.org/tap; there's also an SCS service). Fire up your favourite TAP or SCS client (our preference: TOPCAT) and search for HSOY.

    An all-sky heatmap showing much larger errors south of -30 deg.

    Oh, and in case you're new to the whole TAP/ADQL game: There's our ADQL introduction, and if you're at a German astronomical institution, we'd be happy to hold one of our VO Days at your institute – just drop us a mail.

« Page 2 / 2