• News From the VO Via ActivityPub

    Screenshot of a browser showing the Mastodon rendering of GAVO's ActivityPub feed

    If you ask us: Get a proper client to join the Fediverse. But as shown here, in a pinch a web browser will do, too.

    When Twitter was still fairly young, we had an account there that would tweet out when new data collections appeared in the VO. Even back then, I was rather doubtful whether using a proprietary platform to disseminate open data is a good idea, but as long as the content was also available through standard protocols (RSS in this case), I thought it might be worth a try. Well: It never really took off, and after Twitter broke the whole thing a couple of times by incompatible API changes, I finally let it go ca. 2017.

    Given to the recent mass exodus from the smouldering remains of Twitter into the open and standard Fediverse, I thought reviving our little missives there might actually be a worthwhile effort. Specifically, joining Mastodon – which speaks the ActivityPub protocol and hence is part of the Fediverse – has become really straightforward.

    So, if the VO Fresh RSS Feed is not for you (perhaps because you do not have an RSS aggregator, which would be a shame), maybe following our new Mastodon account @gavo@botsin.space would be for you?

    Oh, and yes, I give you the previews the Mastodon web client produces for VizieR resources are not overly pretty yet (curse Javascript templating!), but then if I were you, I'd disable URL previews anyway; really, they are little more than a privacy annoyance.

  • DaCHS 2.9 is out

    Our VO server package DaCHS almost always sees two releases per year, each time roughly after the Interops[1]. So, with the Tucson Interop over, it's time for DaCHS 2.9, and this is the traditional what's new post.

    Data Origin – the big headline for this release could be “curation”, in that three upcoming standardoid entities in that field are prototyped in 2.9. One is Data Origin, which is a note on how to embed some very basic provenance information into VOTables.

    This is going to help your users figure out how they came up with a VOTable when the referee has clever questions about the paper they submitted half a year earlier. The good news is: if you defined your metadata in your RD with sufficient care, with DaCHS 2.9 you will automatically do Data Origin.

    Feed your D links – another curation-related new thing in DaCHS is an implementation of what will hopefully be known as BibVO in the future. At this point, it is an unpublished note on Github. In essence, the purpose is to feed bibliographic services – and in particular the ADS – “D links”, i.e., links from publications to data. A part of this works automatically (the source metadatum), but the more advanced biblinks need a bit of manual intervention.

    If you even have, say, an observatory bibliography consisting pairs of papers and data used by these papers, you will probably have to write a handful of code. See biblinks in the reference documentation for details if any of this sounds as if it could apply to you. In this context, I have also enabled passing multiple accrefs to the /get endpoint. Users will then receive a tar file of the referenced data products.

    altIdentifiers in relationships – still in the bibliographic realm, VOResource 1.2 will (almost certainly) let you set altIdentifiers, in particular DOIs, when you declare relationships to other resources. That is probably of interest in particular when you want to declare relationships to things outside of the VO to services like b2find that themselves do not understand ivoids. In that situation, you would write something like:

    Cites: Some external thing
    Cites.altIdentifier: doi:10.fake/123412349876
    

    in a <meta> tag in your RD.

    json columns – postgresql has the very tempting and apparently all-powerful json type; it lets you stick complex structures into database columns and thus apparently relieve you of all the tedious tasks of designing database tables and documenting metadata.

    Written like this, you probably notice it's a slippery slope at best. Still, there are some non-hazardous uses for such columns, and thus you can now say type="json" or (probably preferably) type="jsonb" in column definitions. You can feed these columns with dicts, lists or JSON literals in strings. Clients will receive both of them as JSON string literals in char[*] FIELDs with an xtype of json. Neither astropy nor TOPCAT do anything with that xtype yet, but I expect that will change soon.

    Copy coverage – sometimes two resources have the same spatial (and potentially temporal and spectral) coverage. Since obtaining the coverage is an expensive operation, it would be nice to be able to say “aw, look at that other resource and take its coverage.” The classic example in DaCHS is the system-wide SIAP2 service that really is just a parametric wrapper around obscore. In such cases, you can now say something like:

    <coverage fallbackTo="__system__/obscore"/>
    

    – and //siap2 already does. That's one more reason to occasionally run dachs limits //obscore if you offer an obscore table.

    First VOTable row in tests – if you have calls to getFirstVOTableRow in regression tests (you have regression tests, right?) that return multiple rows, these will fail now until you also pass rejectExtras=False to that call. I've had regressions that were hidden by the function's liberal acceptance of extra rows, and it's too simple to produce unstable tests (that magically succeed and fail depending to the current state of the database) with the old behaviour. I hence hope for your sympathy and understanding in case I broke one of your tests.

    ADQL extensions – there is now arr_count to complement the array extension added in 2.7. Also, our custom UDFs transform, normal_random, to_jd, to_mjd, and simbadpoint now have a prefix of ivo_ rather than the previous gavo_. In order not to break existing queries, DaCHS will still accept the gavo_-prefixed names for the forseeable future, but it will no longer advertise them.

    Minor fixes – as usual, there are many minor bug fixes and improvements, the most visible of which is probably that DaCHS now correctly handles literal + chars in multipart-encoded (”uploads”) requests again; that was broken in 2.8 after the removal of the dependency on python's CGI module. Also, MOC-valued columns can now be serialised into non-VOTable formats like JSON or CSV.

    If you have been using DaCHS' built-in HTTPS support, certain clients may have rejected its certificates. That was because we were pulling an expired intermediate certificate from letsencrypt. If you don't understand what I was just saying, don't worry. If you do understand that and know a good way to avoid this kind of calamity in the future, I'm grateful for advice.

    VCS move – when DaCHS was born, using the venerable subversion for version control was considered reputable. These days, fewer and fewer people can still deal with that, and thus I have moved the DaCHS source code into a git repository: https://gitlab-p4n.aip.de/gavo/dachs/.

    I hear you moan “why not github?” Well: don't get me started unless you are prepared to listen to a large helping of proselytising. Suffice it to say that we in academia invented the internet (for all intents and purposes) and it's a shame that we now rely so much on commercial entities to provide our basic services (and then without paying them, as a rule, which is always a dangerous proposition towards commercial entities).

    Anyway: Feel free to use that service's bug tracker; we try to find ways to let you log in there without undue hardship, too.

    At this point, I customarily urge: don't wait, upgrade. If you have our Debian repository enabled, apt update && apt upgrade should do the trick, except if you missed our announcement on dachs-users that our repository key has changed. If you have not updated it, please have a look at our repo page to see what needs to be done. Sorry about this, but our old 1024D key was being frowned upon, so we had to do something.

    Unless you are an old hand and have upgraded many times before, let me recommend a quick glance at our upgrading guide before doing the actual upgrade.

    [1]The reason we wait for the Interops is that we are generally promising to put something into DaCHS at or around these conferences. This time, the preliminary support for json-typed database columns is an example for that.
  • GAVO at the Fall 2023 Interop in Tucson

    The Virtual Observatory, in practical terms, is the set of standards created and maintained by the IVOA. The IVOA, in turn, is a community almost defined by the two conferences it holds every year, the Interops (previously on this blog). The most recent Interop has just ended: The 2023 Tucson Fall Interop. Here are a few notes on what went on there from my (and to some extent GAVO's) perspective.

    A almost-orange orange haging in a tree.

    This fall's IVOA Interop was hosted by Steward Observatory, where they had ripening oranges in the backyard. They were edible!

    For at least a decade and a half, the autumn Interops have been back-to-back with the ADASS conferences. ADASS, short for Astronomical Data Analysis Software and Systems, is a venerable conference series, created far in the last century (this year: ADASS XXXIII) to have a forum for people who work in the magic triangle of astronomy, instrumentation, and data processing. Clearly, such a forum is very well suited to spread the word about the miracles we are working in the VO.

    To that end, I was involved in the creation of three posters: One on the use of MOCs in TAP – a somewhat extended version of something you saw on this blog first –, then one on data discovery in pyVO by Renaud Savalle (Paris) et al – a topic again familiar to readers of this blog – and finally one on improving the description of ADQL to enable more reliable machine validation of its grammar by Grégory Mantelet (Strasbourg) et al.

    As the conference at large goes, I was really delighted to see how basically everyone talking about data publication at all was stressing they are “doing VO”, which was a very welcome change from, perhaps, 10 years ago when this kind of talk was typcially extolling the virtues of one particular web or javascript framework. One of the great thing about standards in general and the VO in particular is that they tend to be a lot more durable than all those frameworks.


    The following Interop was a “short” one, lasting from Friday morning until Sunday noon, which meant that I was far too busy to do anything like a live blog while it went on. Let me hence just briefly point out the main talks related to GAVO's current activities and DaCHS.

    In Data Curation and Preservation on Saturday morning, Baptiste Cecconi (Paris) gave a nice overview of – among other things – what our bridge between the Registry and b2find (in particular, using the VOResource to DataCite mapper) enables in the context of the EOSC, and he briefly touched the question of how to properly make landing pages for VO resources (for which I am currently using another piece of XSLT).

    In the Radio session later that morning, Ixaka Labadie (Granada) gave a talk on how he is using DaCHS to deliver 3D visualisations for fairly impressive (prototype) SKA data. I particularly liked his illustrations of how DaCHS does Datalink and SODA. See his slide 12:

    Boxes and arrows illustrating how SIAP and Datalink are described in DaCHS resource descriptors

    In the afternoon, there was the Registry session, which featured me talking about the harvest trigger service I have been running for a while to help people across the anticlimactic moment when you have published your new resource but it won't show up in TOPCAT or pyVO for a day or so.

    The bulk of this session, however, was used for a discussion about various shortcomings of the Registry or its interfaces that I found pleasantly productive – incidentally, just like the discussion on word lists in EPN-TAP on Friday afternoon's Solar System Session that I had the pleasure to chair.

    In the DAL session on that afternoon, I had two talks: One was on the proposed new interoperable user-defined functions already implemented in DaCHS' ADQL and now coming up in several other services, too. Note to self: Some of these would probably be rather suitable blog post material.

    The second talk was a sort of brief show-and-tell pitch, in which I pointed out that hierarchical TAP examples using the elegant examples:continued property now actually work in both pyVO and TOPCAT:

    A three-level popup menu Service Provided -> Local UDFs -> using ivo_histogram

    Finally, in Sunday morning's Apps session, I talked about global image discovery in pyVO. This was about an early promise of the VO: just say where in space, time, and spectrum you need an image (or spectrum, or time series, or whatever), and some apparatus will find and query all the services that could have pertinent data. It would then present the metadata of the datasets it found in some useful form that would let you make informed decisions which to fetch.

    This was not too difficult in the olden days, but by now the VO is so big and complicated that a pyVO module with fairly involved logic is required. If you don't want to read the notes here, don't worry: I can safely predict that you'll read more about that topic on this blog.

    This is nowhere near done yet; so, it is one more piece of homework that I am taking home with me.

  • GAVO at the AG-Tagung in Berlin

    A booth with a large screen, quite a bit of papers, a roll-up, all behind a glass wall with a sign UNI_VERSUM TUB Exhibition Space.

    It's time again for the annual meeting of the German astronomical society, the Astronomische Gesellschaft. Since we have been reaching out to the community at these meetings there since 2007, there is even a tag for our contributions there on this blog: AG-Tagung.

    Due to fire codes, our traditional booth would almost have ended up in a remote location on the third floor of TU Berlin's main building, and I had already printed desperate pleas to come and try find us. But in a last minute stunt, the local organisers housed us in an almost perfect place (thanks!): we're sitting right near the entrance, where we can rope in passers-by and then convince them they're missing out if they're not “doing VO”.

    One opportunity for them to realise how they're missing out is our puzzler, this year about a lonely O star:

    An overexposed star in a PanSTARRS field with an arrow plotted over it.

    Since this star must have formed very (by astronomical standards) recently, it should still be in its nursery, something like a nebula – but it clearly is not. It's a runaway. But from what?

    Contrary to last year, we will not accept remote entries, sorry – but you're welcome to still try your hand even if you are not in Berlin. Also, if you like the format, there's quite a few puzzlers from previous years to play with.

    I have just (11:30) revealed the first hint towards our sample solution:

    We recommend solving this puzzler using Aladin. There, you can look for services serving, e.g., the Gaia DR3 data in the little “select” box in in the lower left corner. Shameless plug: Try dr3lite.

    If you are on-site: drop by our booth. If not: we will post updates – in particular on the puzzler – here.

    Followup (2023-09-13)

    At yesterday's afternoon coffee break, we gave the following additional hint:

    To plot proper motions for catalogue objects in Aladin, try the Create a filter… entry in the Catalog menu.

    And this morning, we added:

    If you found Gaia DR3, you can also find editions of the NGC catalog (shameless plug: openngc). These are small enough for a plain SELECT * FROM….

    Followup (2023-09-14)

    The last puzzler hint is:

    Aladin's dist tool comes in handy when you want to do quick measurements on the sky. If you are in Berlin, you still have until 16:00 today to hand in your solution.

    However, the puzzler should not prevent you from attending our splinter meeting on e-science and the Virtual Observatory, where I will give an overview over the state of ADQLs in arrays. Regular readers of this blog will remember my previous treatment of the topic, but this time the queries will be about time series.

    Followup (2023-09-14)

    Well, the prize is drawn. This time, it went to a team from Marburg:

    Two persons holding a large towel with an astronomical image printed on it, in the background a big screen with the Aladin VO client on it.

    As promised, here's our solution using Aladin. But one of the nice things about the VO is that you get to choose your tools. One participant using pyVO was kind enough to let us publish their solution using pyVO, too: puzzler2023-solution.py. Thanks to everyone who particpated!

  • Making Custom Indexes for astrometry.net

    When you have an image or a scan of a photographic plate, you usually only have a vague idea of what position the telescope actually was pointed at. Furnishing the image with (more or less) precise information about what pixel corresponds to what sky position is called astrometric calibration. For a while now, arguably the simplest option to do astrometric calibration has been a package called astrometry.net. The eponymous web page has been experiencing… um… operational problems lately, but thanks to the Debian astronomy team, there is a nice package for it in Debian.

    However, just running apt install astrometry.net will not give you a working setup. Astrometry.net in addition needs an “index”, files that map star patterns (“quads“, in astrometry.net jargon) to positions. Debian comes with two pre-made sets of indexes at the moment (see apt search astrometry-data): those based on the Tycho 2 catalogue, and those based on 2MASS.

    For the index based on Tycho 2, you will find packages astrometry-data-tycho2-10-19, astrometry-data-tycho2-09, astrometry-data-tycho2-08, astrometry-data-tycho2-07[1]. The numbers in there (“scale numbers”) define the size of images the index is good for: 19 means “a major part of the sky”, 10 is “about a degree”, 8 “about half a degree”. Indexes for large images only have a few bright stars and hence are rather compact, which is why 10 though 19 fit into one package, whereas astrometry-data-tycho2-07-littleendian weighs in at 141 MB, and indexes at scale number 0 (suitable for images of a few arcminutes) take dozens of Gigabytes if they are for the whole sky.

    So, when you do astrometric calibration, consider the size of your images first and then decide which scale number is sensible for you. It is usually a good idea to try the neighbouring scale numbers, too.

    You can then feed these to your calibration routine. If you are running DaCHS, you will probably want to use the AnetHeaderProcessor, where you give the names of the indexes in the sp_indices; you also have to say where to find the indexes, as in:

    from gavo import api
    
    class MyObsCalibrator(api.AnetHeaderProcessor):
      indexPath = "/usr/share/astrometry"
      sp_indices = ["index-tycho2-09*.fits",
        "index-tycho2-10*.fits",
        "index-tycho2-11*.fits",]
    

    This would be suitable for images that cover about a degree on the sky.

    Custom Indexes for Targeted Observations

    The Tycho catalogue starts becoming severely incomplete below mV ≈ 11, and since astrometry.net needs a few stars on an image to be able to calibrate it, you cannot use it to calibrate images smaller than a few tens of arcminutes (depending on where you look, of course). If you have smaller images, there are the 2MASS-based indexes; but the bluer your images are, the worse 2MASS as an infrared survey will do, and in addition, having the giant indexes is a big waste of storage and compute resources when you know your images are on a rather small part of the sky.

    In such a situation, you will save a lot of CPU and possibly even improve your astrometry if you create a custom index for your specific data. For instance, assume you have images sized about 10 arcminutes, and the observation programme covers a reasonably small set of objects (as long as it's of order a few hundred, a custom index certainly will be a good deal). You could then make your index based on Gaia positions and photometry like this:

    """
    Create an index for astrometry.net and a few small fields based on Gaia.
    
    Be sure to adapt this for your use case; for instance, if what your are
    calibrating will be from only a part of the sky, pick specific healpixes
    (perhaps on a different level; below, we're using level 5).  Also consider
    changing the target epoch, the photometry, or the magnitude limit.
    
    This script takes the sample positions from a text file; have
    space-separated pairs of ra and dec in targets.txt.
    """
    
    import os
    import subprocess
    
    from astropy.table import Table
    import pyvo
    
    # 0 is for images of about two arcminutes, 10 for about degree, 12 for two
    # degrees, etc.
    SIZE_PRESET = 1
    
    # The typical radius of your images in degrees (this is the size of our cone
    # searches, so cut some slack); this needs to be changed in unison with
    # SIZE_PRESET
    IMAGE_RADIUS = 1/10.
    
    
    def get_target_table():
        """must return an astropy table with columns ra and dec in degrees.
    
        (of course, if you have your data in a proper format with actual metadata,
        you don't need any of the ugly magic).
        """
        targets = Table.read("targets.txt", format="ascii")
        targets["col1"].name, targets["col2"].name = "ra", "dec"
        targets["ra"].meta = {"ucd": "pos.eq.ra;meta.main"}
        targets["dec"].meta = {"ucd": "pos.eq.dec;meta.main"}
        return targets
    
    
    def main():
        tap_service = pyvo.dal.TAPService("http://dc.g-vo.org/tap")
        res = tap_service.run_async(f"""
            SELECT g.ra as RA, g.dec as DEC, phot_g_mean_mag as MAG
            FROM gaia.dr3lite AS g
            JOIN TAP_UPLOAD.t1 as mine
                ON DISTANCE(mine.ra, mine.dec, g.ra, g.dec)<{IMAGE_RADIUS}""",
          uploads={"t1": get_target_table()})
    
        cat_file = "basic-cat.fits"
        res.to_table().write(cat_file, format="fits", overwrite=True)
    
        try:
            subprocess.run(["build-astrometry-index", "-i", cat_file,
                "-o", f"./index-custom-{SIZE_PRESET:02d}.fits",
                "-P", str(SIZE_PRESET), "-S", "MAG"])
        finally:
            os.unlink(cat_file)
    
    
    if __name__=="__main__":
        main()
    

    This writes a single file, index-custom-01.fits (in this case).

    If you read your positions from something else than the simple ASCII file I'm assuming here: Be sure to annotate the columns containing RA and Dec with the proper UCDs as shown here. That makes DaCHS (and perhaps other TAP services, too) create the right hints for the database, speeding up things tremendously.

    You can of course change the ADQL query; it might, for instance, help to replace the G magnitudes with RP or BP ones, or you could use a different catalogue than Gaia. Just make sure the FITS table that is written to basic-cat.fits has exactly the columns RA, DEC, and MAG.

    In DaCHS, I tend to keep scripts like the one above in a subdirectory of the resdir called custom-index, and then in the calibration script I write:

    from gavo import api
    
    RD = api.getRD("myres/q")
    
    class MyObsCalibrator(api.AnetHeaderProcessor):
      indexPath = RD.resdir
      sp_indices = ["custom-index/index-custom-01.fits"]
    

    Custom Indexes for Ancient Observations

    On the other hand, if you have oldish images not going terribly deep, you may want to tailor an index for about the epoch the images were taken at. Many bright stars have a proper motion large enough to matter over a century, and so doing epoch propagation (in this case with the ivo_epoch_prop user defined function, which is not available everywhere) is probably a good idea. The following script computes three full-sky indexes with quads around the desired size; note how you can set the limiting magnitude and the size preset:

    """
    Create a full-sky index for bright stars and astrometry.net based on Gaia.
    
    This only works for rather bright stars because the Gaia service will refuse
    to server more than ~1e7 objects.
    
    Make sure to choose SIZE_PRESET to your use case (19 means 30 deg,
    10 about a degree, two preset steps are about a factor two in scale).
    """
    
    import os
    import subprocess
    
    import pyvo
    
    # see the module docstring
    SIZE_PRESET = 12
    
    # ignore stars fainter than this; you can't go below 14 all-sky with Gaia
    # and the GAVO DC server
    MAX_MAG = 12
    
    # Epoch to transform the stars to
    TARGET_EPOCH = 1910
    
    
    def main():
        tap_service = pyvo.dal.TAPService("http://dc.g-vo.org/tap")
        res = tap_service.run_async(f"""
            SELECT pos[1] as RA, pos[2] as DEC, mag as MAG
            FROM (
                SELECT phot_bp_mean_mag AS mag,
                    ivo_epoch_prop(ra, dec, parallax,
                        pmra, pmdec, radial_velocity, 2016, {TARGET_EPOCH}) as pos
                FROM gaia.dr3lite
              WHERE phot_bp_mean_mag<{MAX_MAG}) AS q""")
    
        cat_file = "current.fits"
        res.to_table().write(cat_file, format="fits", overwrite=True)
    
        try:
            for size_preset in range(SIZE_PRESET-1, SIZE_PRESET+2):
                subprocess.run(["build-astrometry-index", "-i", cat_file,
                    "-o", f"./index-custom-{size_preset:02d}.fits",
                    "-P", str(size_preset), "-S", "MAG"])
        finally:
            os.unlink(cat_file)
    
    
    if __name__=="__main__":
        main()
    

    With this and my custom-index directory, your DaCHS header processor could say:

    from gavo import api
    
    RD = api.getRD("myres/q")
    
    class MyObsCalibrator(api.AnetHeaderProcessor):
      indexPath = RD.resdir
      sp_indices = ["custom-index/index-custom-*.fits"]
    

    Custom Indexes: Full-sky and Deep

    I have covered the cases “deep and spotty” and “shallow and full-sky“. The case “deep and full-sky“ is a bit more involved because it still lies in the realm of big data, which always requires extra tricks. In this case, that would be retrieving the basic catalogue in parts – for instance, by HEALPix – and at the same time splitting the index up between HEALPixes, too. This does not require great magic, but it does require a bit of non-trivial bookkeeping, and hence I will only write about it if someone actually needs it – if that's you, please write in.

    [1]You will also find that each of these exist in a littleendian and bigendian flavours; ignore these, your machine will pick what it needs when you install the packages without tags.

« Page 3 / 20 »