Posts with the Tag TAP:

  • DaCHS 2.11: Persistent TAP Uploads

    The DaCHS logo, a badger's head and the text "VO Data Publishing"

    The traditional autumn release of GAVO's server package DaCHS is somewhat late this year, but not so late that could not still claim it comes after the interop. So, here it is: DaCHS 2.11 and the traditional what's new post.

    But first, while I may have DaCHS operators' attention: If you have always wondered why things in DaCHS are as they are, you will probably enjoy the article Declarative Data Publication with DaCHS, which one day will be in the proceedings of ADASS XXXIV (and before that probably on arXiv). You can read it in a pre-preprint version already now at https://docs.g-vo.org/I301.pdf, and feedback is most welcome.

    Persistent TAP Uploads

    The potentially most important new feature of DaCHS 2.11 (in my opinion) will not be news to regular readers of this blog: Persistent TAP Uploads.

    At this point, no client supports this, and presumably when clients do support it, it will look somewhat different, but if you like the bleeding edge and have users that don't mind an occasional curl or requests call, you would be more than welcome to help try the persistent uploads. As an operator, it should be sufficient to type:

    dachs imp //tap_user
    

    To make this more useful, you probably want to hand out proper credentials (make them with dachs adm adduser) to people who want to play with this, and point the interested users to the demo jupyter notebook.

    I am of course grateful for any feedback, in particular on how people find ways to use these features to give operators a headache. For instance, I really would like to avoid writing a quota system. But I strongly suspect will have to…

    On-loaded Execute-s

    DaCHS has a built-in cron-type mechanism, the execute Element. So far, you could tell it to run jobs every x seconds or at certain times of the day. That is fine for what this was made for: updates of “living” data. For instance, the RegTAP RD (which is what's behind the Registry service you are probably using if you are reading this) has something like this:

    <execute title="harvest RofR" every="40000">
      <job><code>
          execDef.spawnPython("bin/harvestRofR.py")
      </code></job>
    </execute>
    

    This will pull in new publishing registries from the Registry of Registries, though that is tangential; the main thing is that some code will run every 40 kiloseconds (or about 12 hours).

    Against using plain cron, the advantage is that DaCHS knows context (for instance, the RD's resdir is not necessary in the example call), that you can sync with DaCHS' own facilities, and most of all that everything is in once place and can be moved together. By the way, it is surprisingly simple to run a RegTAP service of your own if you already run DaCHS. Feel free to inquire if you are interested.

    In DaCHS 2.11, I extended this facility to include “events” in the life of an RD. The use case seems rather remote from living data: Sometimes you have code you want to share between, say, a datalink service and some ingestion code. This is too resource-bound for keeping it in the local namespace, and that would again violate RD locality on top.

    So, the functions somehow need to sit on the RD, and something needs to stick them there. To do that, I recommended a rather hacky technique with a LOOP with codeItems in the respective howDoI section. But that was clearly rather odious – and fragile on top because the RD you manipulated was just being parsed (but scroll down in the howDoI and you will still see it).

    Now, you can instead tell DaCHS to run your code when the RD has finished loading and everything should be in place. In a recent example I used this to have common functions to fetch photometric points. In an abridged version:

    <execute on="loaded" title="define functions"><job>
      <setup imports="h5py, numpy"/>
      <code>
      def get_photpoints(field, quadrant, quadrant_id):
        """returns the photometry points for the specified time series
        from the HDF5 as a numpy array.
    
        [...]
        """
        dest_path = "data/ROME-FIELD-{:02d}_quad{:d}_photometry.hdf5".format(
          field, quadrant)
        srchdf = h5py.File(rd.getAbsPath(dest_path))
        _, arr = next(iter(srchdf.items()))
    
        photpoints = arr[quadrant_id-1]
        photpoints = numpy.array(photpoints)
        photpoints[photpoints==0] = numpy.nan
        photpoints[photpoints==-9999.99] = numpy.nan
    
        return photpoints
    
    
      def get_photpoints_for_rome_id(rome_id):
        """as get_photpoints, but taking an integer rome_id.
        """
        field = rome_id//10000000
        quadrant = (rome_id//1000000)%10
        quadrant_id = (rome_id%1000000)
        base.ui.notifyInfo(f"{field} {quadrant} {quadrant_id}")
        return get_photpoints(field, quadrant, quadrant_id)
    
      rd.get_photpoints = get_photpoints
      rd.get_photpoints_for_rome_id = get_photpoints_for_rome_id
    </code></job></execute>
    

    (full version; if this is asking you to log in, tell your browser not to wantonly switch to https). What is done here in detail again is not terribly relevant: it's the usual messing around with identifiers and paths and more or less broken null values that is a data publisher's everyday lot. The important thing is that with the last two statements, you will see these functions whereever you see the RD, which in RD-near Python code is just about everywhere.

    dachs start taptable

    Since 2018, DaCHS has supported kickstarting the authoring of RDs, which is, I claim, the fun part of a data publisher's tasks, through a set of templates mildly customised by the dachs start command. Nobody should start a data publication with an empty editor window any more. Just pass the sort of data you would like to publish and start answering sensible questions. Well, “sort of data” within reason:

    $ dachs start list
    epntap -- Solar system data via EPN-TAP 2.0
    siap -- Image collections via SIAP2 and TAP
    scs -- Catalogs via SCS and TAP
    ssap+datalink -- Spectra via SSAP and TAP, going through datalink
    taptable -- Any sort of data via a plain TAP table
    

    There is a new entry in this list in 2.11: taptable. In both my own work and watching other DaCHS operators, I have noticed that my advice “if you want to TAP-publish any old material, just take the SCS template and remove everything that has scs in it” was not a good one. It is not as simple as that. I hope taptable fits better.

    A plan for 2.12 would be to make the ssap+datalink template less of a nightmare. So far, you basically have to fill out the whole thing before you can start experimenting, and that is not right. Being able to work incrementally is a big morale booster.

    VOTable 1.5

    VOTable 1.5 (at this point still a proposed recommendation) is a rather minor, cleanup-type update to the VO's main table format. Still, DaCHS has to say it is what it is if we want to be able to declare refposition in COOSYS (which we do). Operators should not notice much of this, but it is good to be aware of the change in case there are overeager VOTable parsers out there or in case you have played with DaCHS MIVOT generator; in 2.10, you could ask it to do its spiel by requesting the format application/x-votable+xml;version=1.5. In 2.11, it's application/x-votable+xml;version=1.6. If you have no idea what I was just saying, relax. If this becomes important, you will meet it somewhere else.

    Minor Changes

    That's almost it for the more noteworthy news; as usual, there are a plethora of minor improvements, bug fixes and the like. Let me briefly mention a few of these:

    • The ADQL form interface's registry record now includes the site name. In case you are in this list, please say dachs pub //adql after upgrading.
    • More visible legal info, temporal, and spatial coverage in table and service infos; one more reason to regularly run dachs limits!
    • VOUnit's % is now known to DaCHS (it should have been since about 2.9)
    • More vocabulary validation for VOResource generation; so, dachs pub might now complain to you when it previously did not. It is now right and was wrong before.
    • If you annotate a column as meta.bib.bibcode, it will be rendered as ADS links
    • The RD info links to resrecs (non-DaCHS resources, essentially), too.

    Upgrade As Convenient

    As usual, if you have the GAVO repository enabled, the upgrade will happen as part of your normal Debian apt upgrade. Still, if you have not done so recently, have a quick look at upgrading in the tutorial. If, on the other hand, you use the Debian-distributed DaCHS package and you do not need any of the new features, you can let things sit and enjoy the new features after your next dist-upgrade.

  • A Proposal for Persistent TAP Uploads

    From its beginning, the IVOA's Table Access Protocol TAP has let users upload their own tables into the services' databases, which is an important element of TAP's power (cf. our upload crossmatch use case for a minimal example). But these uploads only exist for the duration of the request. Having more persistent user-uploaded tables, however, has quite a few interesting applications.

    Inspired by Pat Dowler's 2018 Interop talk on youcat I have therefore written a simple implementation for persistent tables in GAVO's server package DaCHS. This post discusses what is implemented, what is clearly still missing, and how you can play with it.

    If all you care about is using this from Python, you can jump directly to a Jupyter notebook showing off the features; it by and large explains the same things as this blogpost, but using Python instead of curl and TOPCAT. Since pyVO does not know about the proposed extensions, the code necessarily is still a bit clunky in places, but if something like this will become more standard, working with persistent uploads will look a lot less like black art.

    Before I dive in: This is certainly not what will eventually become a standard in every detail. Do not do large implementations against what is discussed here unless you are prepared to throw away significant parts of what you write.

    Creating and Deleting Uploads

    Where Pat's 2018 proposal re-used the VOSI tables endpoint that every TAP service has, I have provisionally created a sibling resource user_tables – and I found that usual VOSI tables and the persistent uploads share virtually no server-side code, so for now this seems a smart thing to do. Let's see what client implementors think about it.

    What this means is that for a service with a base URL of http://dc.g-vo.org/tap[1], you would talk to (children of) http://dc.g-vo.org/tap/user_tables to operate the persistent tables.

    As with Pat's proposal, to create a persistent table, you do an http PUT to a suitably named child of user_tables:

    $ curl -o tmp.vot https://docs.g-vo.org/upload_for_regressiontest.vot
    $ curl -H "content-type: application/x-votable+xml" -T tmp.vot \
      http://dc.g-vo.org/tap/user_tables/my_upload
    Query this table as tap_user.my_upload
    

    The actual upload at this point returns a reasonably informative plain-text string, which feels a bit ad-hoc. Better ideas are welcome, in particular after careful research of the rules for 30x responses to PUT requests.

    Trying to create tables with names that will not work as ADQL regular table identifiers will fail with a DALI-style error. Try something like:

    $ curl -H "content-type: application/x-votable+xml" -T tmp.vot
      http://dc.g-vo.org/tap/user_tables/join
    ... <INFO name="QUERY_STATUS" value="ERROR">'join' cannot be used as an
      upload table name (which must be regular ADQL identifiers, in
      particular not ADQL reserved words).</INFO> ...
    

    After a successful upload, you can query the VOTable's content as tap_user.my_upload:

    A TOPCAT screenshot with a query 'select avg("3.6mag") as blue, avg("5.8mag") as red from tap_user.my_upload' that has a few red warnings, and a result window showing values for blue and red.

    TOPCAT (which is what painted these pixels) does not find the table metadata for tap_user tables (yet), as I do not include them in the “public“ VOSI tables. This is why you see the reddish syntax complaints here.

    I happen to believe there are many good reasons for why the volatile and quickly-changing user table metadata should not be mixed up with the public VOSI tables, which can be several 10s of megabytes (in the case of VizieR). You do not want to have to re-read that (or discard caches) just because of a table upload.

    If you have the table URL of a persistent upload, however, you inspect its metadata by GET-ting the table URL:

    $ curl http://dc.g-vo.org/tap/user_tables/my_upload | xmlstarlet fo
    <vtm:table [...]>
      <name>tap_user.my_upload</name>
      <column>
        <name>"_r"</name>
        <description>Distance from center (RAJ2000=274.465528, DEJ2000=-15.903352)</description>
        <unit>arcmin</unit>
        <ucd>pos.angDistance</ucd>
        <dataType xsi:type="vs:VOTableType">float</dataType>
        <flag>nullable</flag>
      </column>
      ...
    

    – this is a response as from VOSI tables for a single table. Once you are authenticated (see below), you can also retrieve a full list of tables from user_tables itself as a VOSI tableset. Enabling that for anonymous uploads did not seem wise to me.

    When done, you can remove the persistent table, which again follows Pat's proposal:

    $ curl -X DELETE http://dc.g-vo.org/tap/user_tables/my_upload
    Dropped user table my_upload
    

    And again, the text/plain response seems somewhat ad hoc, but in this case it is somewhat harder to imagine something less awkward than in the upload case.

    If you do not delete yourself, the server will garbage-collect the upload at some point. On my server, that's after seven days. DaCHS operators can configure that grace period on their services with the [ivoa]userTableDays setting.

    Authenticated Use

    Of course, as long as you do not authenticate, anyone can drop or overwrite your uploads. That may be acceptable in some situations, in particular given that anonymous users cannot browse their uploaded tables. But obviously, all this is intended to be used by authenticated users. DaCHS at this point can only do HTTP basic authentication with locally created accounts. If you want one in Heidelberg, let me know (and otherwise push for some sort of federated VO-wide authentication, but please do not push me).

    To just play around, you can use uptest as both username and password on my service. For instance:

      $ curl -H "content-type: application/x-votable+xml" -T tmp.vot \
      --user uptest:uptest \
      http://dc.g-vo.org/tap/user_tables/privtab
    Query this table as tap_user.privtab
    

    In recent TOPCATs, you would enter the credentials once you hit the Log In/Out button in the TAP client window. Then you can query your own private copy of the uploaded table:

    A TOPCAT screenshot with a query 'select avg("3.6mag") as blue, avg("5.8mag") as red from tap_user.my_upload' that has a few red warnings, and a result window showing values for blue and red; there is now a prominent Log In/Out-button showing we are logged in.

    There is a second way to create persistent tables (that would also work for anonymous): run a query and prepend it with CREATE TABLE. For instance:

    A TOPCAT screenshot with a query 'create table tap_user.smallgaia AS SELECT * FROM gaia.dr3lite TABLESAMPLE(0.001)'. Again, TOPCAT flags the create as an error, and there is a dialog "Table contained no rows".

    The “error message” about the empty table here is to be expected; since this is a TAP query, it stands to reason that some sort of table should come back for a successful request. Sending the entire newly created table back without solicitation seems a waste of resources, and so for now I am returning a “stub” VOTable without rows.

    As an authenticated user, you can also retrieve a full tableset for what user-uploaded tables you have:

    $ curl --user uptest:uptest http://dc.g-vo.org/tap/user_tables | xmlstarlet fo
    <vtm:tableset ...>
      <schema>
        <name>tap_user</name>
        <description>A schema containing users' uploads. ...  </description>
        <table>
          <name>tap_user.privtab</name>
          <column>
            <name>"_r"</name>
            <description>Distance from center (RAJ2000=274.465528, DEJ2000=-15.903352)</description>
            <unit>arcmin</unit>
            <ucd>pos.angDistance</ucd>
            <dataType xsi:type="vs:VOTableType">float</dataType>
            <flag>nullable</flag>
          </column>
          ...
        </table>
        <table>
          <name>tap_user.my_upload</name>
          <column>
            <name>"_r"</name>
            <description>Distance from center (RAJ2000=274.465528, DEJ2000=-15.903352)</description>
            <unit>arcmin</unit>
            <ucd>pos.angDistance</ucd>
            <dataType xsi:type="vs:VOTableType">float</dataType>
            <flag>nullable</flag>
          </column>
          ...
        </table>
      </schema>
    </vtm:tableset>
    

    Open Questions

    Apart from the obvious question whether any of this will gain community traction, there are a few obvious open points:

    1. Indexing. For tables of non-trivial sizes, one would like to give users an interface to say something like “create an index over ra and dec interpreted as spherical coordinates and cluster the table according to it”. Because this kind of thing can change runtimes by many orders of magnitude, enabling it is not just some optional embellishment.

      On the other hand, what I just wrote already suggests that even expressing the users' requests in a sufficiently flexible cross-platform way is going to be hard. Also, indexing can be a fairly slow operation, which means it will probably need some sort of UWS interface.

    2. Other people's tables. It is conceivable that people might want to share their persistent tables with other users. If we want to enable that, one would need some interface on which to define who should be able to read (write?) what table, some other interface on which users can find what tables have been shared with them, and finally some way to let query writers reference these tables (tap_user.<username>.<tablename> seems tricky since with federated auth, user names may be just about anything).

      Given all this, for now I doubt that this is a use case sufficiently important to make all the tough nuts delay a first version of user uploads.

    3. Deferring destruction. Right now, you can delete your table early, but you cannot tell my server that you would like to keep it for longer. I suppose POST-ing to a destruction child of the table resource in UWS style would be straightforward enough. But I'd rather wait whether the other lacunae require a completely different pattern before I will touch this; for now, I don't believe many persistent tables will remain in use beyond a few hours after their creation.

    4. Scaling. Right now, I am not streaming the upload, and several other implementation details limit the size of realistic user tables. Making things more robust (and perhaps scalable) hence will certainly be an issue. Until then I hope that the sort of table that worked for in-request uploads will be fine for persistent uploads, too.

    Implemented in DaCHS

    If you run a DaCHS-based data centre, you can let your users play with the stuff I have shown here already. Just upgrade to the 2.10.2 beta (you will need to enable the beta repo for that to happen) and then type the magic words:

    dachs imp //tap_user
    

    It is my intention that users cannot create tables in your DaCHS database server unless you say these words. And once you say dachs drop --system //tap_user, you are safe from their huge tables again. I would consider any other behaviour a bug – of which there are probably still quite a few. Which is why I am particularly grateful to all DaCHS operators that try persistent uploads now.

    [1]As already said in the notebook, if http bothers you, you can write https, too; but then it's much harder to watch what's going on using ngrep or friends.
  • DASCH is now in the VO

    Black dots on a white-ish background.  In the middle, some diffuse greyish stuff around a relatively large black dot.

    This frame would show comet 2P/Encke during its proximity to Earth in 1941 – if it went deep enough. But never mind practicalities: If you want to learn about matching ephemeris against the DASCH plate collection (or, really, any sort of obscore-like table), read on.

    For about a century – that is, into the 1980s –, being an observational astronomer meant taking photographic plates and doing tricks with them (unless you were a radio astronomer or one of the very few astronomers peeking beyond radio and optical in those days, of course). This actually is somewhat fortunate for archivists, because unlike many of the early CCD observations that by now are lost with our ability to read the tapes they were stored on, the plates are still there.

    Why Bother?

    However, to make them usable, the plates need to be digitised. In the GAVO data centre, we keep the results of several scan campaigns large and small, such as HDAP, the various data collections joined in the historical photographic plate image archive HPPA, or the delightfully quirky Münster Flare Plates.

    I personally care a lot about these data collections. This is partly because they are indispensible for understanding the history of astronomy. But more importantly, they are the next best thing we have to a time machine; if we have a way of knowing how the sky looked like seventy years ago, it is these plate collections. Having such a time machine is important for all kinds of scientific efforts, including figuring out whether there are aliens (i.e., 2016ApJ...822L..34S) on Tabby's Star.

    Somewhat to my chagrin, the cited paper 2016ApJ...822L..34S did not use the VO to obtain the plate images but went straight to DASCH's web interface. DASCH, in case you have not heard of it before, is probably the most ambitious project concerned with plate digitisation at the moment – or perhaps: “was”, because they just finished scanning the core part of Harvard's plate collections, which was their primary goal.

    I can understand why Bradley Schaefer, the paper's author, did not bother with a VO search In 2016. For starters, working with halfway homogeneous data from instruments you are somewhat familiar saves a substantial amount of work and thought, in particular if you are, in addition, up against the usual lack of machine-readable metadata. Also, at that time DASCH probably had about as many digitised plates as all the VO's contemporary plate collections taken together.

    DASCH: The Harvard Plates

    Given such stats, I have always wanted to have at least the metadata from DASCH's plates in the VO. Thanks to a recent update to DASCH's publication system, this is now a reality. Since 2024-04-29, I am publishing the metadata of the DASCH plates via Obscore and and SIAP2.

    Followup (2024-05-03)

    This is now DASCH news, and one of my two main contacts on the DASCH side, Peter Williams, has written an insightful post on this, too. Let me use this opportunity to thank him for the delightful cooperation, and extend these thanks to Ben Sabath, who is primarily responsible for the update to the DASCH publication system I mentioned above.

    Matching plates are returned as datalink documents, pointing to a preview, photos of the plate and its jacket, and links to the science data, once downsampled by a factor of 16, once in the original size (example). For now, #this points to the downsampled version, as Amazon charges DASCH about three cents per full-scale plate at the moment, and that can quickly add up by accident (there's nothing wrong with consciously downloading full-scale FITS-es if you need them, of course).

    This is a bit fishy in that the size of the image in the obscore/SIAP2 fields s_xel1 and s_xel2 refers to the unscaled image, and thus I should be returning the full-scale image as datalink #this. I hope I will not cause much confusion with this design.

    In case you look at the links in the datalink documents, let me include a disclaimer: Although they point into the GAVO data centre, the data is served courtesy of the DASCH project. The links only go to us because we need to sign links for you. I mention this because you can save the datalink documents and the links within them; the URLs you are redirected to from there, however, will expire fast. Just do not look at them.

    Plates in Global Discovery

    So – what can you do with DASCH in the VO that you could not do before?

    Most importantly, you will discover DASCH in registry interfaces and its datasets in global queries (in particular the global dataset queries I have discussed a few weeks ago). For instance, DASCH is now in Aladin's discovery tree:

    A screen shot with many selected points, highlighted in green, on the right side.  On the left side, an tree display with many branches folded in.  On a folded-out branch, there is “DASCH SIAP2“ highlighted.  On the right side, there is a large rectangle overplotted in red.

    You can now find DASCH in Aladin and do the usual “in view“ searches. However, currently this yields many matches that are, in practical terms, spurious, as they come from extremely wide-angle instruments. The red rectangle is the footprint of one of these images; note that the view here is a full two pi sky. We will probably do something about this “noise“.

    The addition of DASCH to the VO has a strong effect in some use cases. For instance, at the end of the GAVO plates tutorial, we do an all-VO obscore query that, at the time of the last update of the tutorial in 2019, yielded 4067 datasets (of course, including modern and/or non-optical observations) potentially showing some strongly lensed quasar. With DASCH – and, admittedly, a few more collections that came into the VO since 2019 –, that number is now 10'489; the range of observation dates grew from MJD 12550…52000 to MJD 9800…58600, with the mean decreasing from 51'909 to 30'603. That the mean observation date moves that much back in time is a certain sign that a major part of the expansion is due to DASCH (well, and certainly to APPLAUSE, too).

    Followup (2024-05-03)

    As discussed in my DASCH update, I have taken out the large-coverage plates from my obscore table, which changes the stats (but not the conclusions) quite a bit. They is now 10'098 plates and mean observation date 36'396

    TAP, Uploads, and pyVO on DASCH

    But this is not just about bringing astronomical heritage to the VO. It is also about exposing DASCH through the powerful ADQL/TAP interface. As an example of how this may be useful, consider the comet P2/Encke, which, according to JPL's Small-Body Database was relatively close to Earth (about half an AU) in May 1941. It would have had about 14.5 mag at that point and hence was safely within reach of several of the instruments archived in DASCH. Perhaps we can find serendipitous or even targeted observations of the comet in the collection?

    The plan to find that out is: compute an ephemeris (we are lazy and use an external service, Miriade ephemcc) and then for each day see whether there are DASCH observations in the vicinity of the sky location obtained in this way.

    As usual, it's never that easy because the call to the ephemeris webservice (paste the link into TOPCAT to have a look) returns cursed sexagesimal coordinates. We need to fix them before doing anything serious with the table, and while we are at it, we also repair the date, which is simpler to consume if it is MJD to begin with. Getting the ephemeris thus takes quite a few lines:

    from astropy import table
    from astropy import units as u
    from astropy.coordinates import SkyCoord
    from astropy.time import Time
    
    ephem = table.Table.read(
      "https://vo.imcce.fr/webservices/miriade/ephemcc.php?-from=vespa"
      "&-name=c:p/encke&-ep=1941-04-01&-nbd=90&-step=1d&-observer=500"
      &-mime=votable")
    
    parsed = SkyCoord(ephem["ra"], ephem["dec"], unit=(u.hourangle, u.deg))
    ephem["ra"] = parsed.ra.degree
    ephem["dec"] = parsed.dec.degree
    
    parsed = Time(ephem["epoch"])
    ephem["epoch"] = parsed.mjd
    

    Compared to that, the actual matching against DASCH is almost trivial if you are somewhat familiar with crossmatching in ADQL and the Obscore schema:

    svc = pyvo.dal.TAPService("http://dc.g-vo.org/tap")
    res = svc.run_sync("""
        SELECT *
        FROM
            dasch.plates
            JOIN tap_upload.orbit
            ON (1=CONTAINS(POINT(ra, dec), s_region))
        WHERE
            t_min<epoch
            AND t_max>epoch""",
        uploads={"orbit": ephem})
    

    Followup (2024-05-03)

    You would probably query the dasch.narrow_plates table in actual operations; querying dasch.plates is probably more for people interested in the history of astronomy or DASCH itself.

    Inspect the query for a moment: This is a normal upload join, except we are constructing an ADQL POINT on the fly to be able to see whether we are in the spatial region covered by a DASCH dataset (given in obscore's s_region column). We could have put the temporal condition into the join's ON; but I think the intention is somewhat clearer with the WHERE constraint, and the database engine will probably go through identical motions for both queries – the beauty of having a query planner in the loop is that you do not need to think about such details most of the time.

    Actually, in this case there is one last complication: As said above, we have put a datalink service between you and the downloads to discourage accidental large downloads. We hence use pyVO's (suboptimally documented) datalink interface (iter_datalinks):

    with pyvo.samp.connection() as conn:
        for dl in res.iter_datalinks():
            link = next(dl.bysemantics("#preview-image"))
            pyvo.samp.send_image_to(
                conn,
                link.access_url,
                client_name="Aladin")
    

    Among the artefacts available we pick the scaled jpegs in this fragment (#preview-image), since these are almost free even on the Amazon cloud. Change that #preview-image to #this in the to get scaled calibrated FITS-es, which are still fairly small. This would, for instance, let you overplot the ephemeris in Aladin, which you cannot do with the jpegs as they lack astrometric calibration (for now). But even with #preview-image, we can use Aladin as a glorified image viewer by SAMP-sending the images there, which is why we do the minor magic with functions from pyvo.samp.

    If you want to try this yourself or mangle the program to do something else that requires querying against a reasonable number positions in time and space, just get encke.py and hack away. Make sure to start Aladin before running the program so it has something to send the images to.

    Disclaimers

    This is a contrived example, and it is likely that this particular use case is astronomically wrong in several ways. Let me enumerate a few things that would need looking into before this approaches proper science:

    • We compute the ephemeris for the center of the Earth. At half an AU distance, the resulting parallax will not shift the position enough to hide a plate we should know about, but at least for anything closer, you should try to do a bit better; admittedly, for a resource like DASCH – that contains plates from observatories all over the place – you will have to compromise.
    • The ephemeris is probably wrong; comet's orbits change over time, and I have no idea if the ephemeris service actually uses 2P/Encke's 1941 orbit to compute the positions.
    • The coordinate metadata may be wrong. Ephemcc's documentation says something that sounds a lot as if they were sometimes returning RA and Dec for the equator of the time rather than for J2000 (i.e., ICRS for all intents and purposes), but of course our obscore coverages are for the ICRS. Regrettably, the VOTable returned by the service does not contain a COOSYS element yet, and so there is no easy way to tell.
    • If you look at the table with DASCH matches, you will see they all were observed with an extremely wide-angle instrument sporting an aperture of a mere three inches. Even at the whopping exposure times (two hours), there is probably no way you would see a diffuse object of 14th mag on a plate with a 1940s-era photographic emulsion with that kind of optics (well: feel free to prove me wrong).
    • It would of course be a huge waste of bandwidth to pull the entire plates if we already had a good idea of where we would expect the comet (i.e., had a reliable ephemeris). Hence, a cutout service that would let you retrieve more or less exactly the pixels you would like to use for your research and not the cruft around it would be a nifty supplement. It's in the works, and I'd say you can almost hold your breath. The cutout will simply appear as a SODA service in the datalink documents. See 2020ASPC..522..295D for how you would operate such a service.
  • Computing Residuals of an Astrometric Calibration

    Two plots, left a fairly good correlation, right a cloudy wave

    The kind of plot you can make following the recipe given here: Left, a comparison of the photometry, right, a positional residuals, not taking into account the SIP plate solution, when comparing the HDAP plate B3261a against Gaia DR3. Note that the cut-off a 4 arcsec is because of the match radius when obtaining the calibrator stars.

    I recently had to assess the quality of the astrometric calibration of a photographic plate. What I am going to show you in this post will of course work just as well for CCD frames, and if these have a sufficiently large field of view, this may be an issue for them as well. However, the sort of data that needs this assessment most typically are scans of plates, as these tend to have a “wobble”, systematic offsets in the scan direction resulting from imperfections in the mechanics.

    Prerequisites: An astronomical frame with a calibration in ICRS (or some frame not very far from it), called my-image.fits in the following, SExtractor (in Debian and derivatives: apt install source-extractor – long live Debian Astro; since it's called source-extractor in Debian, that's what I'll use here, too), and of course TOPCAT.

    Step 1: Extract Sources. Source extraction is of course a high science, and if you know better than me, by all means do it the way you think is appropriate. Meanwhile, the following might very well work for you sufficiently well.

    Create a working directory and enter it. Then, to create a file telling source-extractor what columns you would like to see, write the following to a file default.param:

    ALPHA_SKY
    DELTA_SKY
    X_IMAGE
    Y_IMAGE
    MAG_ISO
    FLUX_AUTO
    ELONGATION
    

    Next, give a few parameters to source-extractor; depending on the sort of image you have, you may want to play around with DETECT_MINAREA (how many pixels need to show a signal to register as a source) and DETECT_THRESH (how many sigmas a pixel has to be above the background to register as a candidate for belonging to a source). Meanwhile, write the following into a file default.control:

    CATALOG_TYPE     FITS_1.0
    CATALOG_NAME     img.axy
    PARAMETERS_NAME  default.param
    FILTER           N
    DETECT_MINAREA   30
    DETECT_THRESH    4
    SEEING_FWHM      1.2
    

    – but if the following call gives you a few hundred sources, that ought to work for the present purpose.

    Then run:

    source-extractor -c default.control my-image.fits
    

    This will give you a catalogue of extracted objects in the file img.axy.

    Step 2: Fix source-extractor's output. Load that img.axy into TOPCAT. Regrettably, source-extractor does not add any useful metadata to the columns of its output table. To add the absolute bare minimum, in TOPCAT go to ViewsColumn Info. In that window, check UCD in the Display menu, and then put pos.eq.ra and pos.eq.dec into the UCD fields of the ALPHA_SKY and DELTA_SKY columns, respectively; double click to change fields in TOPCAT.

    To see if you have done the annotation right, in TOPCAT's main window, click GraphicsSky Plot. If the objects show up, you have just provided enough annotation to let TOPCAT figure out the position for each row.

    Step 3: Get calibrators. We will now try to add counterparts for Gaia DR3 to the extracted sources. To do that, click VOTable Access Protocol, and in the window popping up double click the entry for the GAVO DC TAP.

    In the Find box, type dr3lite to look for this site's version of the Gaia DR3 source catalogue. Click on gaia.dr3lite to select that table, and then select the Columns pane. This should show some of the Gaia DR3 columns.

    Now ExamplesUpload Join will generate a query that will cross-match your extracted sources with the Gaia sources. You should edit it a bit, only selecting the columns you will actually need, removing the TOP 1000 (at least on large images with more than 1000 sources), and reducing the match radius a bit when the calibration is not actually completely off and your epoch is sufficiently close to J2000.

    Hint: you can control-click in the Columns pane and then use the Cols button to insert all the column names in one go[1]. For me, the resulting query would be:

    SELECT
       source_id, ra, dec, phot_bp_mean_mag,
       tc.*
       FROM gaia.dr3lite AS db
       JOIN TAP_UPLOAD.t1 AS tc
       ON 1=CONTAINS(POINT('ICRS', db.ra, db.dec),
                     CIRCLE('ICRS', tc.ALPHA_SKY, tc.DELTA_SKY, 4./3600.))
    

    This should result in about as many matches as your extraction had – a few more is ok, because you will have some spurious matches, a few less is ok, too, as there are always some outliers and artefacts, but you should clearly not pull a magnitude more or less objects here than you put in; fiddle with the match radius as necessary.

    See if there is a rough correlation between the Gaia calibrators and your extracted sources by plotting phot_bp_mean_mag against MAG_ISO. Absent more information, MAG_ISO, source-extractor's guess for the magnitude of the extracted object, will be just some crazy number, but it should have some discernable correlation with the actual magnitude. Do not expect too much here, in particular with old plates, for which good photometry is a science of their own.

    In my example, this looked like this:

    Plot: a rough correlation in red with a green tail

    The green points certainly are spurious matches; this observation did not reach beyond 14th magnitude or so, and there are many weak stars on the sky, so a few of them will show up in just about any cross match. See the opening picture for an example with a better correlation.

    Step 4: Do the correlation plot. Do GraphicsPlane Plot and then plot ra-alpha_sky or dec-delta_sky against X_IMAGE or Y_IMAGE. You could get something like this:

    Plot: A single wavy thing

    This rather certainly reflects some optical distortion; source-extractor regrettably does not take into account SIP corrections yet, so it is likely that a large part of this would be taken care of by the polynomials of the plate solution (the github issue I am linking to tells you how to be sure).

    But it can also look like this:

    Plot: Multiple wobbles

    This certainly is not the result of a lens or anything optical at all. It's the scanner's gears that you are looking at here. With an amplitude of perhaps three arcseconds this is rather excessive here; but something like this you will rather likely see even on good scanners – though it may essentially be invisible, as of the Heidelberg scanner we used for HDAP:

    Plot: A vertical cloud with no discernible structure.
    [1]I'm using the BP magnitude in the query below as most historical plates tend to be “blue sensitive“ (in some sense). Hence, BP magnitudes should be a bit closer to what source-extractor has extracted.
  • Gaia DR3 XP Spectra: All Sampled

    Lots of blue crosses and a few red squares plotted over a sky photograph of a star cluster

    Around this time of the year on the northern hemisphere, you can spot the h and χ Persei double star cluster with the naked eye. One part of it, NGC 884 is shown here with LAMOST DR6 low resolution spectra (red squares) and Gaia DR3 XP spectra (blue crosses) overplotted. Given that LAMOST has already been one of the largest collections of spectra on the planet, you can see that there is really a lot of those XP spectra.

    When Gaia DR3 was released in June, I was somewhat disappointed when I realised what it is that they delivered as the BP/RP (or XP for short) spectra. You see, I had expected to see something rather similar to what I have in DFBS: structurally, arrays of a few dozen spectral points, mapping wavelengths to some sort of measure of the flux.

    What really came were, mainly, “continuous spectra“, that is coefficients of Gauss-Hermite polynomials. You can fetch them from the gaiadr3.xp_continuous_mean_spectrum table at the ARI-Gaia TAP service; the blue part of the spectrum of the star DR3 4295806720 looks like this in there:

    102.93398893929992, -12.336921213781045, -2.668856168170544, -0.12631176306793765, -0.9347021092539146, 0.05636787290132809, [...]

    No common spectral client can plot this. The Gaia DPAC has helpfully provided a Python library called GaiaXPy to turn these into “proper” spectra. Shortly after the data release, my plan has thus been to turn all these spectra into their “sampled” form using GaiaXPy and then re-publish them, both through SSAP for ad-hoc discovery and through TAP for (potentially) global analysis.

    Alas, for objects too faint to make it into DR3's xp_sampled_mean_spectrum table (that's 35 million spectra already turned to wavelength-flux pairs by DPAC), the spectra generated in this way looked fairly awful, with lots of very artificial-looking wiggles (“ringing”, if you will). After a bit of deliberation, I realised that when the errors are given on the Hermite coefficients, once you compute the samples, these errors will be liberally distributed among the output samples. In other words, the error on the samples will be grossly correlated over arbitrary distances; at least I am fairly helpless when trying to separate signal from artefact in these beasts.

    Bummer. Well, fortunately, Rene Andrae from “up the mountain” (i.e., the MPI for Astronomy) has worked out a reasonably elegant way to get more conventional spectra understandable to mere humans. Basically, you compute n distinct “realisations” of the error model given by the table of the continuous spectra and average over them. The more samples you take, the less correlated your spectral points and their errors will be and the less confusing the signal will be. The service docs for gaia/s3 give the math.

    Doing this on more than 200 million spectra is quite an effort, though, and so after some experimentation I decided to settle on 10 realisations per spectrum and have relatively wide bins (10 nm) over just the optical part of the spectrum (400 through 800 nm). The BP and RP bandpaths are a bit wider, and there is probably signal blotted out by the wide bins; I will probably be addressing this for DR4, except if these spectra become the smash hit they deserve to be.

    The result of this procedure is now available through an SSAP service that should show up in the VO Registry by the time the first of you read this; the Aladin image above gives you an impression of the density of results here – and don't forget: the spectra with the blue crosses are all reasonably well flux-calibrated.

    The data is also available on the TAP service http://dc.g-vo.org/tap, which opens up many interesting possibilities. Let me mention two here.

    Comparison with LAMOST

    I was rather nervous whether what I had done resulted in anything that bore even a fleeting resemblance to reality, and so about the first thing I tried was to compare my new data with what LAMOST has.

    That is a nice exercise for TAP and ADQL. Let's first match spectra from the two surveys, which luckily are on the same server, saving us some cross-server uploads. I am selecting a minimum of data, just the position and the two access URLs, and I let DaCHS' MAXREC kick in so I'm just retrieving 20000 of the millions of result records:

    SELECT a.ssa_location, a.accref, b.accref
    FROM
      gdr3spec.ssameta AS a
      JOIN lamost6.ssa_lrs AS b
      ON DISTANCE(a.ssa_location, b.ssa_location)<0.001
    

    (this is using the DISTANCE(.,.)<radius idiom that we will be migrating towards in ADQL 2.1 instead of the dreaded 1=CONTAINS(POINT, CIRCLE) thing everyone has loathed in ADQL 2.0).

    Using the nifty activation actions, you can now tell TOPCAT to open the two spectra next to each other when you click on a row or a point in a sky plot. To reproduce,

    1. Make a sky plot. TOPCAT doesn't yet pick up the POINT in ssa_location, so you have to configure the Lon and Lat fields yourself to ssa_location[0] and ssa_location[1].
    2. Open the activation actions, either from the button bar or from the Views menu.
    3. In there, select Plot Table, make sure it says accref in Table Location and then check Plot Table in the Actions pane. When you now click on a point in the sky plot, you should see a spectrum pop up, except it is plotted with dots, which most people consider inappropriate for spectra. Use the Form tab in the plot window to style it a bit more spectrum-like (I recommend looking into Line and XYError).
    4. But how do you now add the LAMOST plot? I don't think TOPCAT's activation actions let you plot right into the plane plot you just configured. But you can add a second Plot Table action from the Actions menu in the window with the activation actions. As before, configure this new item, except this one needs to plot accref_ (which is what DaCHS has called the access reference for LAMOST to keep the names unique).
    5. As for Gaia, configure to plot to look good as a spectrum. In order to make the two spectra optically comparable, under Axes set the range to 4000 to 8000 Angstrom manually here.

    You can now click on points in your sky plot and, after a second or so, see the corresponding spectra next to each other (if you place the two plot windows that way).

    If you try this, you will (hopefully) see that major features of spectra are nicely reproduced, such as with these, I guess, molecular bands:

    Two line plots next to each other, the right one showing more features.  the left one roughly follows the major wiggles, though.

    As you probably have guessed, the extremely low-resolution Gaia XP spectrum is left, LAMOST's (somewhat higher-resolution) low-resolution spectrum is right:

    This also works with absorption in the blue, as in this example:

    Two line plots next to each other, the right one showing a lot of relatively sharp absoprtion lines, which the left one does not have.  A few major bumps are present in both, and the general shape conincides nicely, expect perhaps at the blue edge.

    In case of doubt, I have to say I'd probably trust Gaia's calibration around 400 nm better than LAMOST's. But that's mere guesswork.

    For fainter objects, you will see remnants of the systematic wiggles from the Hermite polynomials:

    Two line plots next to each other.  Both are relatively noisy, in particular on the blue edge.  The left one also seems to have a rather regular oscillation at the blue edge.

    Anyway, if you keep an eye on the errors, you can probably even work with spectra from the fainter objects:

    Two line plots next to each other.  The left one has fairly strong ringing which is not present in the right one, but it mainly stays within the error bars.  The total flux of this star is at least a factor of 10 less than for the prettier examples above.

    Mass Retrieval of Spectra

    One nice thing about the short spectra is that you can fetch many of them in one go and in very little time. For instance, to retrieve particularly red objects from the Gaia catalogue of Nearby Stars (also on the GAVO server) with spectra, say:

    SELECT
      source_id, ra, dec, parallax, phot_g_mean_mag,
      phot_bp_mean_mag, phot_rp_mean_mag, ruwe, adoptedrv,
      flux, flux_error
    FROM gcns.main
    JOIN gdr3spec.spectra
    USING (source_id)
    WHERE phot_rp_mean_mag<phot_bp_mean_mag-4
    

    [in case you wonder how I quickly got the column names enumerated here: do control-clicks into the Columns pane in TOCPAT's TAP window and then use the Cols button]. For when you do not have Gaia DR3 source_id-s in your source table, there is also gdr3spec.withpos against which you can do more conventional positional crossmatches.

    Within a few seconds, you can retrieve more than 4000 spectra in this way. You can now do whatever analysis you want on these spectra. Or, well, just plot them. The following procedure for that later task uses TOPCAT features only available in the next release, due before mid-October[1].

    First, make a colour-magnitude diagram (CMD) from this table as usual (e.g., BP-RP vs G). Then, open another plane plot and

    1. LayersAdd XYArray Control
    2. Configure the XYArray to plot from the table you just fetched, have nothing in X Values[2] and flux in Y Values.
    3. Under Axes, configure Y Log in order to better show the 4253 spectra at one time.
    4. Throw away or at least uncheck all other layers in the plot.
    5. In order to let TOPCAT highlight the spectrum of the activated source, in the Subsets pane check the Activated subset (that's the bleeding-edge functionality you will not have in older TOPCATs) and give it a sufficiently bright colour.

    With that, you can now click around in your CMD and immediately see that source's spectrum in the context of all the others, like this:

    An animation of someone selecting various points in a CMD and have simulataneous spectra plotted.

    These spectra have also inspired me to design and implement a vector extension for ADQL, which lets you do even more interesting things with these spectra. More on this… soon.

    [1]The Activated subset is only available in TOPCAT versions later than 4.8-7 (released in October 2022).
    [2]These should be the spectral points; DaCHS does not deliver them with this query because I am a coward. I think I will find my courage relatively soon and then fix this. Once that has happened, you can select param$spectral as X values. [Update: Mark Taylor remarks that by writing sequence(41, 400, 10) in bleeding-edge TOPCATs and add(multiply(10,sequence(41)),400) before that, you can add a proper spectral axis until then]

Page 1 / 3 »