Posts with the Tag TOPCAT:

  • A Proposal for Persistent TAP Uploads

    From its beginning, the IVOA's Table Access Protocol TAP has let users upload their own tables into the services' databases, which is an important element of TAP's power (cf. our upload crossmatch use case for a minimal example). But these uploads only exist for the duration of the request. Having more persistent user-uploaded tables, however, has quite a few interesting applications.

    Inspired by Pat Dowler's 2018 Interop talk on youcat I have therefore written a simple implementation for persistent tables in GAVO's server package DaCHS. This post discusses what is implemented, what is clearly still missing, and how you can play with it.

    If all you care about is using this from Python, you can jump directly to a Jupyter notebook showing off the features; it by and large explains the same things as this blogpost, but using Python instead of curl and TOPCAT. Since pyVO does not know about the proposed extensions, the code necessarily is still a bit clunky in places, but if something like this will become more standard, working with persistent uploads will look a lot less like black art.

    Before I dive in: This is certainly not what will eventually become a standard in every detail. Do not do large implementations against what is discussed here unless you are prepared to throw away significant parts of what you write.

    Creating and Deleting Uploads

    Where Pat's 2018 proposal re-used the VOSI tables endpoint that every TAP service has, I have provisionally created a sibling resource user_tables – and I found that usual VOSI tables and the persistent uploads share virtually no server-side code, so for now this seems a smart thing to do. Let's see what client implementors think about it.

    What this means is that for a service with a base URL of http://dc.g-vo.org/tap[1], you would talk to (children of) http://dc.g-vo.org/tap/user_tables to operate the persistent tables.

    As with Pat's proposal, to create a persistent table, you do an http PUT to a suitably named child of user_tables:

    $ curl -o tmp.vot https://docs.g-vo.org/upload_for_regressiontest.vot
    $ curl -H "content-type: application/x-votable+xml" -T tmp.vot \
      http://dc.g-vo.org/tap/user_tables/my_upload
    Query this table as tap_user.my_upload
    

    The actual upload at this point returns a reasonably informative plain-text string, which feels a bit ad-hoc. Better ideas are welcome, in particular after careful research of the rules for 30x responses to PUT requests.

    Trying to create tables with names that will not work as ADQL regular table identifiers will fail with a DALI-style error. Try something like:

    $ curl -H "content-type: application/x-votable+xml" -T tmp.vot
      http://dc.g-vo.org/tap/user_tables/join
    ... <INFO name="QUERY_STATUS" value="ERROR">'join' cannot be used as an
      upload table name (which must be regular ADQL identifiers, in
      particular not ADQL reserved words).</INFO> ...
    

    After a successful upload, you can query the VOTable's content as tap_user.my_upload:

    A TOPCAT screenshot with a query 'select avg("3.6mag") as blue, avg("5.8mag") as red from tap_user.my_upload' that has a few red warnings, and a result window showing values for blue and red.

    TOPCAT (which is what painted these pixels) does not find the table metadata for tap_user tables (yet), as I do not include them in the “public“ VOSI tables. This is why you see the reddish syntax complaints here.

    I happen to believe there are many good reasons for why the volatile and quickly-changing user table metadata should not be mixed up with the public VOSI tables, which can be several 10s of megabytes (in the case of VizieR). You do not want to have to re-read that (or discard caches) just because of a table upload.

    If you have the table URL of a persistent upload, however, you inspect its metadata by GET-ting the table URL:

    $ curl http://dc.g-vo.org/tap/user_tables/my_upload | xmlstarlet fo
    <vtm:table [...]>
      <name>tap_user.my_upload</name>
      <column>
        <name>"_r"</name>
        <description>Distance from center (RAJ2000=274.465528, DEJ2000=-15.903352)</description>
        <unit>arcmin</unit>
        <ucd>pos.angDistance</ucd>
        <dataType xsi:type="vs:VOTableType">float</dataType>
        <flag>nullable</flag>
      </column>
      ...
    

    – this is a response as from VOSI tables for a single table. Once you are authenticated (see below), you can also retrieve a full list of tables from user_tables itself as a VOSI tableset. Enabling that for anonymous uploads did not seem wise to me.

    When done, you can remove the persistent table, which again follows Pat's proposal:

    $ curl -X DELETE http://dc.g-vo.org/tap/user_tables/my_upload
    Dropped user table my_upload
    

    And again, the text/plain response seems somewhat ad hoc, but in this case it is somewhat harder to imagine something less awkward than in the upload case.

    If you do not delete yourself, the server will garbage-collect the upload at some point. On my server, that's after seven days. DaCHS operators can configure that grace period on their services with the [ivoa]userTableDays setting.

    Authenticated Use

    Of course, as long as you do not authenticate, anyone can drop or overwrite your uploads. That may be acceptable in some situations, in particular given that anonymous users cannot browse their uploaded tables. But obviously, all this is intended to be used by authenticated users. DaCHS at this point can only do HTTP basic authentication with locally created accounts. If you want one in Heidelberg, let me know (and otherwise push for some sort of federated VO-wide authentication, but please do not push me).

    To just play around, you can use uptest as both username and password on my service. For instance:

      $ curl -H "content-type: application/x-votable+xml" -T tmp.vot \
      --user uptest:uptest \
      http://dc.g-vo.org/tap/user_tables/privtab
    Query this table as tap_user.privtab
    

    In recent TOPCATs, you would enter the credentials once you hit the Log In/Out button in the TAP client window. Then you can query your own private copy of the uploaded table:

    A TOPCAT screenshot with a query 'select avg("3.6mag") as blue, avg("5.8mag") as red from tap_user.my_upload' that has a few red warnings, and a result window showing values for blue and red; there is now a prominent Log In/Out-button showing we are logged in.

    There is a second way to create persistent tables (that would also work for anonymous): run a query and prepend it with CREATE TABLE. For instance:

    A TOPCAT screenshot with a query 'create table tap_user.smallgaia AS SELECT * FROM gaia.dr3lite TABLESAMPLE(0.001)'. Again, TOPCAT flags the create as an error, and there is a dialog "Table contained no rows".

    The “error message” about the empty table here is to be expected; since this is a TAP query, it stands to reason that some sort of table should come back for a successful request. Sending the entire newly created table back without solicitation seems a waste of resources, and so for now I am returning a “stub” VOTable without rows.

    As an authenticated user, you can also retrieve a full tableset for what user-uploaded tables you have:

    $ curl --user uptest:uptest http://dc.g-vo.org/tap/user_tables | xmlstarlet fo
    <vtm:tableset ...>
      <schema>
        <name>tap_user</name>
        <description>A schema containing users' uploads. ...  </description>
        <table>
          <name>tap_user.privtab</name>
          <column>
            <name>"_r"</name>
            <description>Distance from center (RAJ2000=274.465528, DEJ2000=-15.903352)</description>
            <unit>arcmin</unit>
            <ucd>pos.angDistance</ucd>
            <dataType xsi:type="vs:VOTableType">float</dataType>
            <flag>nullable</flag>
          </column>
          ...
        </table>
        <table>
          <name>tap_user.my_upload</name>
          <column>
            <name>"_r"</name>
            <description>Distance from center (RAJ2000=274.465528, DEJ2000=-15.903352)</description>
            <unit>arcmin</unit>
            <ucd>pos.angDistance</ucd>
            <dataType xsi:type="vs:VOTableType">float</dataType>
            <flag>nullable</flag>
          </column>
          ...
        </table>
      </schema>
    </vtm:tableset>
    

    Open Questions

    Apart from the obvious question whether any of this will gain community traction, there are a few obvious open points:

    1. Indexing. For tables of non-trivial sizes, one would like to give users an interface to say something like “create an index over ra and dec interpreted as spherical coordinates and cluster the table according to it”. Because this kind of thing can change runtimes by many orders of magnitude, enabling it is not just some optional embellishment.

      On the other hand, what I just wrote already suggests that even expressing the users' requests in a sufficiently flexible cross-platform way is going to be hard. Also, indexing can be a fairly slow operation, which means it will probably need some sort of UWS interface.

    2. Other people's tables. It is conceivable that people might want to share their persistent tables with other users. If we want to enable that, one would need some interface on which to define who should be able to read (write?) what table, some other interface on which users can find what tables have been shared with them, and finally some way to let query writers reference these tables (tap_user.<username>.<tablename> seems tricky since with federated auth, user names may be just about anything).

      Given all this, for now I doubt that this is a use case sufficiently important to make all the tough nuts delay a first version of user uploads.

    3. Deferring destruction. Right now, you can delete your table early, but you cannot tell my server that you would like to keep it for longer. I suppose POST-ing to a destruction child of the table resource in UWS style would be straightforward enough. But I'd rather wait whether the other lacunae require a completely different pattern before I will touch this; for now, I don't believe many persistent tables will remain in use beyond a few hours after their creation.

    4. Scaling. Right now, I am not streaming the upload, and several other implementation details limit the size of realistic user tables. Making things more robust (and perhaps scalable) hence will certainly be an issue. Until then I hope that the sort of table that worked for in-request uploads will be fine for persistent uploads, too.

    Implemented in DaCHS

    If you run a DaCHS-based data centre, you can let your users play with the stuff I have shown here already. Just upgrade to the 2.10.2 beta (you will need to enable the beta repo for that to happen) and then type the magic words:

    dachs imp //tap_user
    

    It is my intention that users cannot create tables in your DaCHS database server unless you say these words. And once you say dachs drop --system //tap_user, you are safe from their huge tables again. I would consider any other behaviour a bug – of which there are probably still quite a few. Which is why I am particularly grateful to all DaCHS operators that try persistent uploads now.

    [1]As already said in the notebook, if http bothers you, you can write https, too; but then it's much harder to watch what's going on using ngrep or friends.
  • A Data Publisher's Diary: Wide Images in DASCH

    An Aladin screenshot with many green squares overplotted on a DSS image sized 20×15 degrees.

    This is the new resonse when you query the DASCH SIAP service for Aladin's default view on the horsehead nebula. As you can see, at least the returned images no longer are distributed over half of the sky (note the size of the view).

    The first reaction I got when the new DASCH in the VO service hit Aladin was: “your SIAP service is broken, it just dumps all images it has at me rather than honouring my positional constraint.”

    I have to admit I was intially confused as well when an in-view search from Aladin came back with images with centres on almost half the sky as shown in my DASCH-in-Aladin illustration. But no, the computer did the right thing. The matching images in fact did have pixels in the field of view. They were just really wide field exposures, made to “patrol” large parts of the sky or to count meteors.

    DASCH's own web interface keeps these plates out of the casual users' views, too. I am following this example now by having two tables, dasch.narrow_plates (the “narrow” here follows DASCH's nomenclature; of course, most plates in there would still count as wide-field in most other contexts) and dasch.wide_plates. And because the wide plates are probably not very helpful to modern mainstream astronomers, only the narrow plates are searched by the SIAP2 service, and only they are included with obscore.

    In addition to giving you a little glimpse into the decisions one has to make when running a data centre, I wrote this post because making a provisional (in the end, I will follow DASCH's classification, of course) split betwenn “wide” and “narrow” plates involved a bit of simple ADQL that may still be not totally obvious and hence may merit a few words.

    My first realisation was that the problem is less one of pixel scale (it might also be) but of the large coverage. How do we figure out the coverage of the various instruments? Well, to be robust against errors in the astrometric calibration (these happen), let us average; and average over the area of the polygon we have in s_region, for which there is a convenient ADQL function. That is:

    SELECT instrument_name, avg(area(s_region)) as meanarea
    FROM dasch.plates
    GROUP BY instrument_name
    

    It is the power of ADQL aggregate function that for this characterisation of the data, you only need to download a few kilobytes, the equivalent of the following histogram and table:

    A histogram with a peak of about 20 at zero, with groups of bars going all the way beyond 4000.  The abscissa is marked “meanarea/deg**2”.
    Instrument Name mean size [sqdeg]
    Eastman Aero-Ektar K-24 Lens on a K-1...  
    Cerro Tololo 4 meter  
    Logbook Only. Pages without plates.  
    Roe 6-inch  
    Palomar Sky Survey (POSS)  
    1.5 inch Ross (short focus) 4284.199799877725
    Patrol cameras 4220.802442888225
    1.5-inch Ross-Xpress 4198.678060743206
    2.8-inch Kodak Aero-Ektar 3520.3257323233624
    KE Camera with Installed Rough Focus 3387.5206396388453
    Eastman Aero-Ektar K-24 Lens on a K-1... 3370.5283986677637
    Eastman Aero-Ektar K-24 Lens on a K-1... 3365.539790633015
    3 inch Perkin-Zeiss Lens 1966.1600884072298
    3 inch Ross-Tessar Lens 1529.7113188540836
    2.6-inch Zeiss-Tessar 1516.7996790591587
    Air Force Camera 1420.6928219265849
    K-19 Air Force Camera 1414.074101143854
    1.5 in Cooke "Long Focus" 1220.3028263587332
    1 in Cook Lens #832 Series renamed fr... 1215.1434235932702
    1-inch 1209.8102811770807
    1.5-inch Cooke Lenses 1209.7721123964636
    2.5 inch Cooke Lens 1160.1641223648048
    2.5-inch Ross Portrait Lens 1137.0908812243645
    Damons South Yellow 1106.5016573891376
    Damons South Red 1103.327982978934
    Damons North Red 1101.8455616455205
    Damons North Blue 1093.8380971825375
    Damons North Yellow 1092.9407550755682
    New Cooke Lens 1087.918570304363
    Damons South Blue 1081.7800084709982
    2.5 inch Voigtlander (Little Bache or... 548.7147592220762
    NULL 534.9269386355818
    3-inch Ross Fecker 529.9219051692568
    3-inch Ross 506.6278856912204
    3-inch Elmer Ross 503.7932693652602
    4-inch Ross Lundin 310.7279860552893
    4-inch Cooke (1-327) 132.690621660727
    4-inch Cooke Lens 129.39637516917298
    8-inch Bache Doublet 113.96821604869973
    10-inch Metcalf Triplet 99.24964308212328
    4-inch Voightlander Lens 98.07368690379751
    8-inch Draper Doublet 94.57937153909593
    8-inch Ross Lundin 94.5685388440282
    8-inch Brashear Lens 37.40061588712761
    16-inch Metcalf Doublet (Refigured af... 33.61565584978583
    24-33 in Jewett Schmidt 32.95324914757339
    Asiago Observatory 92/67 cm Schmidt 32.71623733985344
    12-inch Metcalf Doublet 31.35112644688316
    24-inch Bruce Doublet 22.10390937657793
    7.5-inch Cooke/Clark Refractor at Mar... 14.625992810622787
    Positives 12.600189007151709
    YSO Double Astrograph 10.770798601877804
    32-36 inch BakerSchmidt 10 1/2 inch r... 10.675406541122827
    13-inch Boyden Refractor 6.409447066606171
    11-inch Draper Refractor 5.134521254785461
    24-inch Clark Reflector 3.191361603405415
    Lowel 40 inch reflector 1.213284257086087
    200 inch Hale Telescope 0.18792105301170514

    For the instruments with an empty mean size, no astrometric calibrations have been created yet. To get a feeling for what these numbers mean, recall that the celestial sphere has an area of 4 π rad², that is, 4⋅180²/π or 42'000 square degrees. So, some instruments here indeed covered 20% of the night sky in one go.

    I was undecided between cutting at 150 (there is a fairly pronounced gap there) or at 50 (the gap there is even more pronounced) square degrees and provisionally went for 150 (note that this might still change in the coming days), mainly because of the distribution of the plates.

    You see, the histogram above is about instruments. To assess the consequences of choosing one cut or the other, I would like to know how many images a given cut will remove from our SIAP and ObsTAP services. Well, aggregate functions to the rescue again:

    SELECT ROUND(AREA(s_region)/100)*100 AS platebin, count(*) AS ct
    FROM dasch.plates
    GROUP BY platebin
    

    To plot such a pre-computed histogram in TOPCAT, tell the histogram plot window to use ct as the weight, and you will see something like this:

    A wide histogram with a high peak at about 50, rising to 1.2e5. Another noticeable concentration is around 1250, and there is signifiant weight also approaching 450 from the left.

    It was this histogram that made me pick 150 deg² as the cutoff point for what should be discoverable in all-VO queries: I simply wanted to retain the plates in the second bar from left.

  • HEALPix Maps: In General and in Gaia

    blue and reddish pixels drawing a bar on the sky.

    A map of average Gaia colours in HEALPixes 2/83 and 2/86 (Orion south-east). This post tells you how to (relatively) quickly produce such maps.

    This year's puzzler for the AG Tagung turned out to be a valuable source of interesting ADQL queries. I have already written about finding dusty spots on the sky, and in the puzzler solution, I had promised some words on creating dust maps, or, more generally, HEALPix maps of any sort.

    Making HEALPix maps with Gaia source_ids

    The basic technique is explained in Mark Taylor's classical ADASS poster from 2016. On GAVO's TAP service (access URL http://dc.g-vo.org/tap), you will also find an example for that (in TOPCAT's TAP window, check the Service-provided section unter the Examples button for it). However, once you have Gaia source_ids, there is something a lot faster and arguably not much less convenient. Let me quote the footnote on source_id from my DR3 lite table:

    For the contents of Gaia DR3, the source ID consists of a 64-bit integer, least significant bit = 1 and most significant bit = 64, comprising:

    • a HEALPix index number (sky pixel) in bits 36 - 63; by definition the smallest HEALPix index number is zero.
    • […]

    This means that the HEALpix index level 12 of a given source is contained in the most significant bits. HEALpix index of 12 and lower levels can thus be retrieved as follows:

    • [...]
    • HEALpix [at] level n = (source_id)/(235⋅412 − n).

    That is: Once you have a Gaia source_id, you an compute HEALpix indexes on levels 12 or less by a simple integer division! I give you that the more-than-35-bit numbers you have to divide by do look a bit scary – but you can always come back here for cutting and pasting:

    HEALPix level Integer-divide source id by
    12 34359738368
    11 137438953472
    10 549755813888
    9 2199023255552
    8 8796093022208
    7 35184372088832
    6 140737488355328
    5 562949953421312
    4 2251799813685248
    3 9007199254740992
    2 36028797018963968

    If you know – and that is very valuable knowledge far beyond this particular application – that you can simply jump between HEALPix indexes of different levels by multiplying with or integer-dividing by four, the general formula in the footnote actually becomes rather memorisable. Let me illustrate that with an example in Python. HEALPix number 3145 on level 6 is:

    >>> 3145//4  # ...within this HEALPix on level 5...
    786
    >>> 3145*4, (3145+1)*4  # ..and covers these on level 7...
    (12580, 12584)
    

    Simple but ingenious.

    You can immediately exploit this to make HEALPix maps like the one in the puzzler. This piece of ADQL does the job within a few seconds on the GAVO DC TAP service[1]:

    SELECT source_id/8796093022208 AS pix,
      AVG(phot_bp_mean_mag-phot_rp_mean_mag) AS avgcol
    FROM gaia.edr3lite
    WHERE distance(ra, dec, 246.7, -24.5)<2
    GROUP by pix
    

    Using the table above, you see that the horrendous 8796093022208 is the code for HEALPix level 8. When you remember (and you should) that HEALPix level 6 corresponds to a linear dimension of about 1 degree and each level is a factor of two in linear dimension, you see that the map ought to have a resolution of about 1/8th of a degree.

    HEALPix to Screen Pixel

    How do you plot this? Well, in TOPCAT, do GraphicsSky Plot, and then in the plot window LayersAdd HEALPix control (there are icons for both of these, too). You then have to manually configure the plot for the table you just retrieved: Set the Level to 8, the index to pix and the Value to avgcol – we're working on making the annotation a bit richer so that TOPCAT has a chance to figure this out by itself.

    With a bit of extra configuration, you get the following map of average colours (really: dust concentration):

    Plot: Black and reddish pixels showing a bit of structure

    This is not totally ideal, as at the border of the cone, certain Healpixes are only partially covered, which makes statistics unnecessarily harder.

    Positional Constraints using source_ids

    Due to Gaia's brilliant numbering scheme, we can do analysis by HEALpix, too, circumventing (among other things) this problem. Say you are interested in the vicinity of the M42 and would like to investigate a patch of about 8 degrees. By our rule of thumb, 8 degrees is three levels up from the one-degree level 6. To find the corresponding HEALpix index, on DaCHS servers with their gavo_simbadpoint UDF you could say:

    SELECT TOP 1 ivo_healpix_index(3, gavo_simbadpoint('M42'))
    FROM tap_schema.tables
    

    Hu, you ask, what's tap_schema.tables to do with this? Well, nothing, really. It's just that ADQL's syntax requires selecting from a table, even if what we select is completely independent of any table, as for instance the index of M42's 3-HEALpix. The hack above picks in a table guaranteed to exist on all TAP services, and the TOP 1 makes sure we only compute the value once. In case you ever feel the need to abuse a TAP service as a calculator: Keep this trick in mind.

    The result, 334, you could also have found more graphically, as follows:

    1. Start Aladin
    2. Check OverlayHEALPix grid
    3. Enter M42 in Command
    4. Zoom out until you see HEALPix indexes of level 3 in the grid.

    An advantage you have with this method: You see that M42 happens to lie on a border of HEALPixes; perhaps you should include all of 334, 335, 356, and 357 if you were really interested in the Orion Nebula's vicinity.

    We, on the other hand, are just interested in instructive examples, and hence let's just repeat our colour mapping with all Gaia objects from HEALPix 3/334. How do you select these? Well, by source_id's construction, you know their source_ids will be between 334⋅9007199254740992 and (334 + 1)⋅9007199254740992 − 1:

    SELECT source_id/8796093022208 AS pix,
      AVG(phot_bp_mean_mag-phot_rp_mean_mag) AS avgcol
    FROM gaia.edr3lite
    WHERE source_id BETWEEN 334*9007199254740992 AND 335*9007199254740992-1
    GROUP by pix
    

    This is computationally cheap (though Postgres, not being a column store still has to do quite a bit of I/O; note how much faster this query is when you run it again and all the tuples are already in memory). Even going to HEALPix level 2 would in general still be within our sync time limit. The opening figure was produced with the constraint:

    source_id BETWEEN 83*36028797018963968 AND 84*36028797018963968-1
    OR source_id BETWEEN 86*36028797018963968 AND 87*36028797018963968-1
    

    – and with a sync query.

    Aggregating over a Non-HEALPix

    One last point: The constraints we have just been using are, in effect, positional constraints. You can also use them as quick and in some sense rather unbiased sampling tools.

    For instance, if you would like so see how the reddening in one of the “dense“ spots in the opening picture behaves with distance, you could first pick a point – α = 98, δ = 4, say –, then convert that to a level 7 healpix as above (that's/88974) and then write:

    SELECT ROUND(r_med_photogeo/200)*200 AS distbin, COUNT(*) as n,
        AVG(phot_bp_mean_mag-phot_rp_mean_mag) AS avgcol
    FROM gaia.dr3lite
    JOIN gedr3dist.main USING (source_id)
    WHERE source_id BETWEEN 88974*35184372088832 and 88975*35184372088832-1
    GROUP BY distbin
    

    This is creating 200 pc bins in distance based on the estimates in the gedr3dist.main table (note that this adds subtle correlations, because these estimates already contain Gaia colour information). Since quite a few of these bins will be very sparsely populated, I'm also fetching the number of objects contributing. And then I plot the whole thing, using the conventional (n) ⁄ n as a rough estimate for the relative error:

    Plot: A line that first slowly declines, then rises quite a bit, then flattens out and becomes crazy as errors start to dominate.

    This plot immediatly shows that colour systematics are not exclusively due to dust, as in that case things would only get redder all the time. The blueward trend up to 700 pc is reasonably well explained by the brighter, bluer upper main sequence becoming more dominant in the population sampled as red dwarfs become too faint for Gaia.

    The strong reddening setting in after that is rather certainly due to the Orion complex, though I would perhaps not have expected it to reach out to 2 kpc (the conventional distance to M42 is about 0.5 kpc); without having properly thought about it, I'll chalk it off as “the Orion arm“. And after that, it's again what I'd call Malmquist-blueing until the whole things dissolves into noise.

    In conclusion: Did you know you can group by both healpix and distbin at the same time? I am sure there are interesting structures to be found in what you will get from such a query…

    [1]You may be tempted to write source_id/(POWER(2, 35)*POWER(4, 3) here for clarity. Resist that temptation. POWER returns floating point numbers. If you have one float in a division, not even a ROUND will get you back into the integer division realm, and the whole trick implodes. No, you will need the integer literals for now.
  • Computing Residuals of an Astrometric Calibration

    Two plots, left a fairly good correlation, right a cloudy wave

    The kind of plot you can make following the recipe given here: Left, a comparison of the photometry, right, a positional residuals, not taking into account the SIP plate solution, when comparing the HDAP plate B3261a against Gaia DR3. Note that the cut-off a 4 arcsec is because of the match radius when obtaining the calibrator stars.

    I recently had to assess the quality of the astrometric calibration of a photographic plate. What I am going to show you in this post will of course work just as well for CCD frames, and if these have a sufficiently large field of view, this may be an issue for them as well. However, the sort of data that needs this assessment most typically are scans of plates, as these tend to have a “wobble”, systematic offsets in the scan direction resulting from imperfections in the mechanics.

    Prerequisites: An astronomical frame with a calibration in ICRS (or some frame not very far from it), called my-image.fits in the following, SExtractor (in Debian and derivatives: apt install source-extractor – long live Debian Astro; since it's called source-extractor in Debian, that's what I'll use here, too), and of course TOPCAT.

    Step 1: Extract Sources. Source extraction is of course a high science, and if you know better than me, by all means do it the way you think is appropriate. Meanwhile, the following might very well work for you sufficiently well.

    Create a working directory and enter it. Then, to create a file telling source-extractor what columns you would like to see, write the following to a file default.param:

    ALPHA_SKY
    DELTA_SKY
    X_IMAGE
    Y_IMAGE
    MAG_ISO
    FLUX_AUTO
    ELONGATION
    

    Next, give a few parameters to source-extractor; depending on the sort of image you have, you may want to play around with DETECT_MINAREA (how many pixels need to show a signal to register as a source) and DETECT_THRESH (how many sigmas a pixel has to be above the background to register as a candidate for belonging to a source). Meanwhile, write the following into a file default.control:

    CATALOG_TYPE     FITS_1.0
    CATALOG_NAME     img.axy
    PARAMETERS_NAME  default.param
    FILTER           N
    DETECT_MINAREA   30
    DETECT_THRESH    4
    SEEING_FWHM      1.2
    

    – but if the following call gives you a few hundred sources, that ought to work for the present purpose.

    Then run:

    source-extractor -c default.control my-image.fits
    

    This will give you a catalogue of extracted objects in the file img.axy.

    Step 2: Fix source-extractor's output. Load that img.axy into TOPCAT. Regrettably, source-extractor does not add any useful metadata to the columns of its output table. To add the absolute bare minimum, in TOPCAT go to ViewsColumn Info. In that window, check UCD in the Display menu, and then put pos.eq.ra and pos.eq.dec into the UCD fields of the ALPHA_SKY and DELTA_SKY columns, respectively; double click to change fields in TOPCAT.

    To see if you have done the annotation right, in TOPCAT's main window, click GraphicsSky Plot. If the objects show up, you have just provided enough annotation to let TOPCAT figure out the position for each row.

    Step 3: Get calibrators. We will now try to add counterparts for Gaia DR3 to the extracted sources. To do that, click VOTable Access Protocol, and in the window popping up double click the entry for the GAVO DC TAP.

    In the Find box, type dr3lite to look for this site's version of the Gaia DR3 source catalogue. Click on gaia.dr3lite to select that table, and then select the Columns pane. This should show some of the Gaia DR3 columns.

    Now ExamplesUpload Join will generate a query that will cross-match your extracted sources with the Gaia sources. You should edit it a bit, only selecting the columns you will actually need, removing the TOP 1000 (at least on large images with more than 1000 sources), and reducing the match radius a bit when the calibration is not actually completely off and your epoch is sufficiently close to J2000.

    Hint: you can control-click in the Columns pane and then use the Cols button to insert all the column names in one go[1]. For me, the resulting query would be:

    SELECT
       source_id, ra, dec, phot_bp_mean_mag,
       tc.*
       FROM gaia.dr3lite AS db
       JOIN TAP_UPLOAD.t1 AS tc
       ON 1=CONTAINS(POINT('ICRS', db.ra, db.dec),
                     CIRCLE('ICRS', tc.ALPHA_SKY, tc.DELTA_SKY, 4./3600.))
    

    This should result in about as many matches as your extraction had – a few more is ok, because you will have some spurious matches, a few less is ok, too, as there are always some outliers and artefacts, but you should clearly not pull a magnitude more or less objects here than you put in; fiddle with the match radius as necessary.

    See if there is a rough correlation between the Gaia calibrators and your extracted sources by plotting phot_bp_mean_mag against MAG_ISO. Absent more information, MAG_ISO, source-extractor's guess for the magnitude of the extracted object, will be just some crazy number, but it should have some discernable correlation with the actual magnitude. Do not expect too much here, in particular with old plates, for which good photometry is a science of their own.

    In my example, this looked like this:

    Plot: a rough correlation in red with a green tail

    The green points certainly are spurious matches; this observation did not reach beyond 14th magnitude or so, and there are many weak stars on the sky, so a few of them will show up in just about any cross match. See the opening picture for an example with a better correlation.

    Step 4: Do the correlation plot. Do GraphicsPlane Plot and then plot ra-alpha_sky or dec-delta_sky against X_IMAGE or Y_IMAGE. You could get something like this:

    Plot: A single wavy thing

    This rather certainly reflects some optical distortion; source-extractor regrettably does not take into account SIP corrections yet, so it is likely that a large part of this would be taken care of by the polynomials of the plate solution (the github issue I am linking to tells you how to be sure).

    But it can also look like this:

    Plot: Multiple wobbles

    This certainly is not the result of a lens or anything optical at all. It's the scanner's gears that you are looking at here. With an amplitude of perhaps three arcseconds this is rather excessive here; but something like this you will rather likely see even on good scanners – though it may essentially be invisible, as of the Heidelberg scanner we used for HDAP:

    Plot: A vertical cloud with no discernible structure.
    [1]I'm using the BP magnitude in the query below as most historical plates tend to be “blue sensitive“ (in some sense). Hence, BP magnitudes should be a bit closer to what source-extractor has extracted.
  • Tangible Astronomy and Movies with TOPCAT

    This March, I've put up two new VO resources (that's jargon for “table or service or whatever”) that, I think, fit quite well what I like to call tangible astronomy: things you can readily relate to what you see when you step out at night. And, since I'm a professing astronomy nerd, that's always nicely gratifying.

    The two resources are the Constellations as Polygons and the Gaia eDR3 catalogue of nearby stars (GCNS).

    Constellations

    On the constellations, you might rightfully say that's really far from science. But then they do help getting an idea where something is, and when and from where you might see something. I've hence wanted for a long time to re-publish the Davenhall Constellation Boundary Data as proper, ADQL-queriable polygons, and figuring out where the loneliest star in the sky (and Voyager 1) were finally made me do it.

    GCNS density around taurus

    Taurus in the GCNS density plot: with constellations!

    So, since early March there's the cstl.geo table on the TAP service at https://dc.g-vo.org/tap with the constallation polygons in its p column. Which, for starters, means it's trivial to overplot constallation boundaries in your favourite VO clients now, as in the plot above. To make it, I've just done a boring SELECT * FROM cstl.geo, did the background (a plain HEALPix density plot of GCNS) and, clicked Layers → Add Area Control and selected the cstl.geo table.

    If you want to identify constellations by clicking, while in the area control, choose “add central” from the Forms menu in the Form tab; that's what I did in the figure above to ensure that what we're looking at here is the Hyades and hence Taurus. Admittedly: these “centres“ are – as in the catalogue – just the means of the vertices rather than the centres of mass of the polygon (which are hard to compute). Oh, and: there is also the AreaLabel in the Forms menu, for when you need the identification more than the table highlighting (be sure to use a center anchor here).

    Note that TOPCAT's polygon plot at this point is not really geared towards large polygons (which the constellations are) right now. At the time of writing, the documentation has: “Areas specified in this way are generally intended for displaying relatively small shapes such as instrument footprints. Larger areas may also be specified, but there may be issues with use.” That you'll see at the edges of the sky plots – but keeping that in mind I'd say this is a fun and potentially very useful feature.

    What's a bit worse: You cannot turn the constellation polygons into MOCs yet, because the MOC library currently running within our database will not touch non-convex polygons. We're working on getting that fixed.

    Nearby Stars

    Similarly tangible in my book is the GCNS: nearby stars I always find romantic.

    Let's look at the 100 nearest stars, and let's add spectral types from Henry Draper (cf. my post on Annie Cannon's catalogue) as well as the constellation name:

    WITH nearest AS (
    SELECT TOP 100
      a.source_id,
      a.ra, a.dec,
      phot_g_mean_mag,
      dist_50,
      spectral
    FROM gcns.main AS a
    LEFT OUTER JOIN hdgaia.main AS b
      ON (b.source_id_dr3=a.source_id)
    ORDER BY dist_50 ASC)
    SELECT nearest.*, name
    FROM nearest
    JOIN cstl.geo AS g
      ON (1=CONTAINS(
        POINT(nearest.ra, nearest.dec),
        p))
    

    Note how I'm using CONTAINS with the polygon in the constellations table here; that's the usage I've had in mind for this table (and it's particularly handy with table uploads).

    That I have a Common Table Expression (“WITH”) here is due to SQL planner confusion (I'll post something about that real soon now): With the WITH, the machine first selects the nearest 100 rows and then does the (relatively costly) spatial match, without it, the machine (somewhat surprisingly) did the geometric match first. This particular confusion looks fixable, but for now I'd ask you for forgiveness for the hack – and the technique is often useful anyway.

    If you inspect the result, you will notice that Proxima Cen is right there, but α Cen is missing; without having properly investigated matters, I'd say it's just too bright for the current Gaia data reduction (and quite possibly even for future Gaia analysis).

    Most of the objects on that list that have made it into the HD (i.e., have a spectral type here) are K dwarfs – which is an interesting conspiracy between the limits of the HD (the late red and old white dwarfs are too weak for it) and the limits of Gaia (the few earlier stars within 6 parsec – which includes such luminaries as Sirius at a bit more than 2.5 pc – are just too bright for where Gaia data reduction is now).

    Animation

    Another fairly tangible thing in the GCNS is the space velcity, given in km/s in the three dimensions U, V, and W. That is, of course, an invitation to look for stellar streams, as, within the relatively small portion of the Milky Way the GCNS looks at, stars on similar orbits will exhibit similar space motions.

    Considering the velocity dispersion within a stellar stream will be a few km/s, let's have the database bin the data. Even though this data is small enough to conveniently handle locally, this kind of remote analysis is half of what TAP is really great at (the other half being the ability to just jump right into a new dataset). You can group by multiple things at the same time:

    SELECT
      COUNT(*) AS n,
      ROUND(uvel_50/5)*5 AS ubin,
      ROUND(vvel_50/5)*5 AS vbin,
      ROUND(wvel_50/5)*5 AS wbin
    FROM gcns.main
    GROUP BY ubin, vbin, wbin
    

    Note that this (truly) 3D histogram only represents a small minority of the GCNS objects – you need radial velocities for space motion, and these are precious even in the Gaia age.

    What really surprised me is how clumpy this distribution is – are we sure we already know all stellar streams in the solar neighbourhood? Watch for yourself (if your browser can't play webm, complain to your vendor):

    [Update (2021-04-01): Mark Taylor points out that the “flashes” you sometimes see when the grid is aligned with the viewing axes (and the general appearance) could be improved by just pulling all non-NULL UVW values out of the table and using a density plot (perhaps shading=density densemap=inferno densefunc=linear). That is quite certainly true, but it would of course defeat the purpose of having on-server aggregation. Which, again, isn't all that critical for this dataset, so doing the prettier plot actually is a valuable exercise for the reader]

    How did I make this video? Well, I started with a Cube Plot in TOPCAT as usual, configuring weighted plotting with n as its weight and played around a bit with scaling out a few outliers. And then I saved the table (to zw.vot), hit “STILTS“ in the plot window and saved the text from there to a text file, zw.sh. I had to change the ``in`` clause in the script to make it look like this:

    #!/bin/sh
    stilts plot2cube \
     xpix=887 ypix=431 \
     xlabel='ubin / km/s' ylabel='vbin / km/s' \
     zlabel='wbin / km/s' \
     xmin=-184.5 xmax=49.5 ymin=-77.6 ymax=57.6 \
     zmin=-119.1 zmax=94.1 phi=-84.27 theta=90.35 \
      psi=-62.21 \
     auxmin=1 auxmax=53.6 \
     auxvisible=true auxlabel=n \
     legend=true \
     layer=Mark \
        in=zw.vot \
        x=ubin y=vbin z=wbin weight=n \
        shading=weighted size=2 color=blue
    

    – and presto, sh zw.sh would produce the plot I just had in TOPCAT. This makes a difference because now I can animate this.

    In his documentation, Mark already has a few hints on how to build animations; here are a few more ideas on how to organise this. For instance, if, as I want here, you want to animate more than one variable, stilts tloop may become a bit unwieldy. Here's how to give the camera angles in python:

    import sys
    from astropy import table
    import numpy
    
    angles = numpy.array(
      [float(a) for a in range(0, 360)])
    table.Table([
        angles,
        40+30*numpy.cos((angles+57)*numpy.pi/180)],
      names=("psi", "theta")).write(
        sys.stdout, format="votable")
    

    – the only thing to watch out for is that the names match the names of the arguments in stilts that you want to animate (and yes, the creation of angles will make numpy afficionados shudder – but I wasn't sure if I might want to have somewhat more complex logic there).

    [Update (2021-04-01): Mark Taylor points out that all that Python could simply be replaced with a straightforward piece of stilts using the new loop table scheme in stilts, where you would simply put:

    animate=:loop:0,360,0.5
    acmd='addcol phi $1'
    acmd='addcol theta 40+30*cosDeg($1+57)'
    

    into the plot2cube command line – and you wouldn't even need the shell pipeline.]

    What's left to do is basically the shell script that TOPCAT wrote for me above. In the script below I'm using a little convenience hack to let me quickly switch between screen output and file output: I'm defining a shell variable OUTPUT, and when I un-comment the second OUTPUT, stilts renders to the screen. The other changes versus what TOPCAT gave me are de-dented (and I've deleted the theta and psi parameters from the command line, as I'm now filling them from the little python script):

    OUTPUT="omode=out out=pre-movie.png"
    #OUTPUT=omode=swing
    
    python3 camera.py |\
    stilts plot2cube \
       xpix=500 ypix=500 \
       xlabel='ubin / km/s' ylabel='vbin / km/s' \
       zlabel='wbin / km/s' \
       xmin=-184.5 xmax=49.5 ymin=-77.6 ymax=57.6 \
       zmin=-119.1 zmax=94.1 \
       auxmin=1 auxmax=53.6 \
    phi=8 \
    animate=- \
    afmt=votable \
    $OUTPUT \
       layer=Mark \
          in=zw.vot \
          x=ubin y=vbin z=wbin weight=n \
          shading=weighted size=4 color=blue
    
    # render to movie with something like
    # ffmpeg -i "pre-movie-%03d.png" -framerate 15 -pix_fmt yuv420p /stream-movie.webm
    # (the yuv420p incantation is so real-world
    # web browsers properly will not go psychedelic
    # with the colours)
    

    The comment at the end says how to make a proper movie out of the PNGs this produces, using ffmpeg (packaged with every self-respecting distribution these days) and yielding a webm. Yes, going for mpeg x264 might be a lot faster for you as it's a lot more likely to have hardware support, but everything around mpeg is so patent-infested that for the sake of your first-born's soul you probably should steer clear of it.

    Movies are fun in webm, too.

Page 1 / 2 »