Posts with the Tag Tutorials:

Persistent TAP Uploads Update: A Management Interface

2025-05-21 Markus Demleitner

Setting Lifetimes
Creating Indexes
Special Index Types

a screenshot of a python notebook; a few lines of python yield the current date, a few more a date a year from now.

There is a new version of the jupyter notebook showing off the persistent TAP uploads in python coming with this post, too: Get it.

Six months ago, I reported on my proposal for persistent uploads into TAP services on this very blog: Basically, you could have and keep your own tables in databases of TAP servers supporting this, either by uploading them or by creating them with an ADQL query. Each such table has a URI; you PUT to it to create it, you GET from it to inspect its metadata, VOSI-style, and you DELETE to it to drop the table once you're done.

Back then, I enumerated a few open issues; two of these I have recently addressed: Lifetime management and index creation. Here is how:

Setting Lifetimes

In my scheme, services assign a lifetime to user-uploaded tables, mainly in order to nicely recover when users don't keep books on what they created. The service will eventually clean up their tables after them, in the case of the reference implementation in DaCHS after a week.

However, there are clearly cases when you would like to extend the lifetime of your table beyond that week. To let users do that, my new interface copies the pattern of UWS. There, jobs have a destruction child. You can post DALI-style timestamps[1] there, and both the POST and the GET return a DALI timestamp of the destruction time actually set; this may be different from what you asked for because services may set hard limits (in my case, a year).

For instance, to find out when the service will drop the table you will create when you follow last October's post you could run:

curl http://dc.g-vo.org/tap/user_tables/my_upload/destruction

To request that the table be preserved until the Epochalypse you would say:

curl -F DESTRUCTION=2038-01-19T03:14:07Z http://dc.g-vo.org/tap/user_tables/my_upload/destruction

Incidentally (and as you can see from the POST response), until January 2037, my service will reduce your request to “a year from now”.

I can't say I'm too wild about posting a parameter called “DESTRUCTION” to an endpoint that's called “destruction” (even if it weren't such a mean word). UWS did that because they wanted it make it easy to operate a compliant service from a web browser. Whether that still is a reasonable design goal (in particular because everyone seems to be wild on dumping 20 metric tons of Javascript on their users even things like UWS would make it easy to not do that) is certainly debatable. But I thought it's better to have a single questionable pattern throughout rather than have something a little odd in one place and something a little less odd in another place.

Creating Indexes

For many applications of database systems, having indexes is crucial. You really, really don't want to have to go through a table with 2 billion rows just to find a single object (that's a quarter of a day when you manage to pull through 100'000 rows a second; ok: today's computers are faster than that). While persistently uploaded tables won't (regularly) have two billion rows any time soon, indexes are already very valuable even for tables in the million-row range.

On the other hand, there are many sorts of indexes, and there are many ways to qualify indexes. To get an idea of what you might want to tell a database about an index, see Postgres' CREATE INDEX docs. And that's just for Postgres; other database systems still do it differently, and of course when you index on expressions, there is no limit to the complexity you can build into your indexes.

Building a cross-database API that would reflect all that is entirely out of the question. Hence, I went for the other extreme: You just specify which column(s) you would like to have indexed, and the service is supposed to choose a plausible index type for you.

Following the model of destruction (typography matters!), this is done by POST-ing one or more column names in INDEX parameters to the index child of the table url. For instance, if you have put a table my_upload that has a column Kmag (e.g., from last October's post), you would say:

curl -L -F INDEX=Kmag http://dc.g-vo.org/tap/user_tables/my_upload/index

The -L makes curl follow the redirect that this issues. Why would it redirect, you ask? The index request creates a UWS job behin the scenes, that is, something like a TAP async job. What you get redirected to is that job.

The background is that for large tables and complex indexes, you may easily get your (appartently idle) connection cut while the index is being created, and you would never learn if a problem had materialised or when the indexing is done. Against that, UWS lets us keep running, and you have a URI at which to inspect the progress of the indexing operation (well, frankly: nothing yet beyond “is it done?”).

Speaking UWS with curl is no fun, but then you don't need to: The job starts in QUEUED and will automatically execute when the machine next has time. In case you are curious, see the notebook linked above, where there is an example for manually following the job's progress. You could use generic UWS clients to watch it, too.

A weak point of the scheme (and one that's surprisingly hard to fix) is that the index is immediately shown in the table metadata the notebook linked to above shows this; I'll spare you the VODataService XML that curl-ing the table URL will spit at you, but in there you will see the Kmag index whether or not the indexer job has run.

It shares this deficit with another way to look at indexes. You see, since there is so much backend-specific stuff you may want to know about an index, I am also proposing that when you GET the index child, you get back the actual database statements, or at least something rather similar. This is expressly not supposed to be machine readable, if only because what you see is highly dependent on the underlying database.

Here is how this looks like on DaCHS over postgres after the index call on Kmag:

$ curl http://dc.g-vo.org/tap/user_tables/my_upload/index
Indexes on table tap_user.my_upload

CREATE INDEX my_upload_Kmag ON tap_user.my_upload (Kmag)

I would not want to claim that this particular human-readable either. But humans that try to understand why a computer does not behave as they expect will certainly appreciate something like this.

Special Index Types

If you look at the tmp.vot from last october's post, you will see that there is an a pair of equatorial coordinates in _RAJ2000 and _DEJ2000. It is nicely marked up with pos.eq UCDs, and the units are deg: This is an example of a column set that DaCHS has special index magic for. Try it:

curl -L -F INDEX=_RAJ2000 -F INDEX=_DEJ2000 \
  http://dc.g-vo.org/tap/user_tables/my_upload/index > /dev/null

Another GET against index will show you that this index is a bit different, stuttering something about q3c (or perhaps spoint at another time or on another service):

Indexes on table tap_user.my_upload

CREATE INDEX my_upload__RAJ2000__DEJ2000 ON tap_user.my_upload (q3c_ang2ipix("_RAJ2000","_DEJ2000"))
CLUSTER my_upload__RAJ2000__DEJ2000 ON tap_user.my_upload
CREATE INDEX my_upload_Kmag ON tap_user.my_upload (Kmag)

DaCHS will also recognise spatial points. Let's quickly create a table with a few points by running:

CREATE TABLE tap_user.somepoints AS
SELECT TOP 30 preview, ssa_location
FROM gdr3spec.ssameta

on the TAP server at http://dc.g-vo.org/tap, for instance in TOPCAT (as explained in post one, the “Table contained no rows” message you will see then is to be expected). Since TOPCAT does not know about persistent uploads yet, you have to create the index using curl:

curl -LF INDEX=ssa_location http://dc.g-vo.org/tap/user_tables/somepoints/index

GET-ting the index URL after that will yield:

Indexes on table tap_user.somepoints

CREATE INDEX ndpmaliptmpa_ssa_location ON tap_user.ndpmaliptmpa USING GIST (ssa_location)
CLUSTER ndpmaliptmpa_ssa_location ON tap_user.ndpmaliptmpa

The slightly shocking name of the table is an implementation detail that I might want to hide at some point; the important thing here is the USING GIST that indicates DaCHS has realised that for spatial queries to be able to use the index, a special method is necessary.

Incidentally, I was (and still am) not entirely sure what to do when someone asks for this:

curl -L -F INDEX=_Glon -F INDEX=_DEJ2000 \
  http://dc.g-vo.org/tap/user_tables/my_upload/index > /dev/null

That's a latitude and a longitude all right, but of course they don't belong together. Do I want to treat these as two random columns being indexed together, or do I decide that the user very probably wants to use a very odd coordinate system here?

Well, try it and see how I decided; after this post, you know what to do.

[1]	Many people call that “ISO format”, but I cannot resist pointing out that ISO, in addition to charging people who want to read their standards an arm and leg, admits a panic-inducing variety of date formats, and so “ISO format” not a particularly useful term.

A Proposal for Persistent TAP Uploads

2024-10-11 Markus Demleitner
From its beginning, the IVOA's Table Access Protocol TAP has let users upload their own tables into the services' databases, which is an important element of TAP's power (cf. our upload crossmatch use case for a minimal example). But these uploads only exist for the duration of the request. Having more persistent user-uploaded tables, however, has quite a few interesting applications.

Inspired by Pat Dowler's 2018 Interop talk on youcat I have therefore written a simple implementation for persistent tables in GAVO's server package DaCHS. This post discusses what is implemented, what is clearly still missing, and how you can play with it.

If all you care about is using this from Python, you can jump directly to a Jupyter notebook showing off the features; it by and large explains the same things as this blogpost, but using Python instead of curl and TOPCAT. Since pyVO does not know about the proposed extensions, the code necessarily is still a bit clunky in places, but if something like this will become more standard, working with persistent uploads will look a lot less like black art.

Before I dive in: This is certainly not what will eventually become a standard in every detail. Do not do large implementations against what is discussed here unless you are prepared to throw away significant parts of what you write.
Creating and Deleting Uploads

Where Pat's 2018 proposal re-used the VOSI tables endpoint that every TAP service has, I have provisionally created a sibling resource user_tables – and I found that usual VOSI tables and the persistent uploads share virtually no server-side code, so for now this seems a smart thing to do. Let's see what client implementors think about it.

What this means is that for a service with a base URL of http://dc.g-vo.org/tap [1], you would talk to (children of) http://dc.g-vo.org/tap/user_tables to operate the persistent tables.

As with Pat's proposal, to create a persistent table, you do an http PUT to a suitably named child of user_tables:
```
$ curl -o tmp.vot https://docs.g-vo.org/upload_for_regressiontest.vot
$ curl -H "content-type: application/x-votable+xml" -T tmp.vot \
  http://dc.g-vo.org/tap/user_tables/my_upload
Query this table as tap_user.my_upload
```
The actual upload at this point returns a reasonably informative plain-text string, which feels a bit ad-hoc. Better ideas are welcome, in particular after careful research of the rules for 30x responses to PUT requests.

Trying to create tables with names that will not work as ADQL regular table identifiers will fail with a DALI-style error. Try something like:
```
$ curl -H "content-type: application/x-votable+xml" -T tmp.vot
  http://dc.g-vo.org/tap/user_tables/join
... <INFO name="QUERY_STATUS" value="ERROR">'join' cannot be used as an
  upload table name (which must be regular ADQL identifiers, in
  particular not ADQL reserved words).</INFO> ...
```
After a successful upload, you can query the VOTable's content as tap_user.my_upload:

TOPCAT (which is what painted these pixels) does not find the table metadata for tap_user tables (yet), as I do not include them in the “public“ VOSI tables. This is why you see the reddish syntax complaints here.

I happen to believe there are many good reasons for why the volatile and quickly-changing user table metadata should not be mixed up with the public VOSI tables, which can be several 10s of megabytes (in the case of VizieR). You do not want to have to re-read that (or discard caches) just because of a table upload.

If you have the table URL of a persistent upload, however, you inspect its metadata by GET-ting the table URL:
```
$ curl http://dc.g-vo.org/tap/user_tables/my_upload | xmlstarlet fo
<vtm:table [...]>
  <name>tap_user.my_upload</name>
  <column>
    <name>"_r"</name>
    <description>Distance from center (RAJ2000=274.465528, DEJ2000=-15.903352)</description>
    <unit>arcmin</unit>
    <ucd>pos.angDistance</ucd>
    <dataType xsi:type="vs:VOTableType">float</dataType>
    <flag>nullable</flag>
  </column>
  ...
```
– this is a response as from VOSI tables for a single table. Once you are authenticated (see below), you can also retrieve a full list of tables from user_tables itself as a VOSI tableset. Enabling that for anonymous uploads did not seem wise to me.

When done, you can remove the persistent table, which again follows Pat's proposal:
```
$ curl -X DELETE http://dc.g-vo.org/tap/user_tables/my_upload
Dropped user table my_upload
```
And again, the text/plain response seems somewhat ad hoc, but in this case it is somewhat harder to imagine something less awkward than in the upload case.

If you do not delete yourself, the server will garbage-collect the upload at some point. On my server, that's after seven days. DaCHS operators can configure that grace period on their services with the [ivoa]userTableDays setting.
Authenticated Use

Of course, as long as you do not authenticate, anyone can drop or overwrite your uploads. That may be acceptable in some situations, in particular given that anonymous users cannot browse their uploaded tables. But obviously, all this is intended to be used by authenticated users. DaCHS at this point can only do HTTP basic authentication with locally created accounts. If you want one in Heidelberg, let me know (and otherwise push for some sort of federated VO-wide authentication, but please do not push me).

To just play around, you can use uptest as both username and password on my service. For instance:
```
  $ curl -H "content-type: application/x-votable+xml" -T tmp.vot \
  --user uptest:uptest \
  http://dc.g-vo.org/tap/user_tables/privtab
Query this table as tap_user.privtab
```
In recent TOPCATs, you would enter the credentials once you hit the Log In/Out button in the TAP client window. Then you can query your own private copy of the uploaded table:

There is a second way to create persistent tables (that would also work for anonymous): run a query and prepend it with CREATE TABLE. For instance:

The “error message” about the empty table here is to be expected; since this is a TAP query, it stands to reason that some sort of table should come back for a successful request. Sending the entire newly created table back without solicitation seems a waste of resources, and so for now I am returning a “stub” VOTable without rows.

As an authenticated user, you can also retrieve a full tableset for what user-uploaded tables you have:
```
$ curl --user uptest:uptest http://dc.g-vo.org/tap/user_tables | xmlstarlet fo
<vtm:tableset ...>
  <schema>
    <name>tap_user</name>
    <description>A schema containing users' uploads. ...  </description>
    <table>
      <name>tap_user.privtab</name>
      <column>
        <name>"_r"</name>
        <description>Distance from center (RAJ2000=274.465528, DEJ2000=-15.903352)</description>
        <unit>arcmin</unit>
        <ucd>pos.angDistance</ucd>
        <dataType xsi:type="vs:VOTableType">float</dataType>
        <flag>nullable</flag>
      </column>
      ...
    </table>
    <table>
      <name>tap_user.my_upload</name>
      <column>
        <name>"_r"</name>
        <description>Distance from center (RAJ2000=274.465528, DEJ2000=-15.903352)</description>
        <unit>arcmin</unit>
        <ucd>pos.angDistance</ucd>
        <dataType xsi:type="vs:VOTableType">float</dataType>
        <flag>nullable</flag>
      </column>
      ...
    </table>
  </schema>
</vtm:tableset>
```
Open Questions

Apart from the obvious question whether any of this will gain community traction, there are a few obvious open points:
1. Indexing. For tables of non-trivial sizes, one would like to give users an interface to say something like “create an index over ra and dec interpreted as spherical coordinates and cluster the table according to it”. Because this kind of thing can change runtimes by many orders of magnitude, enabling it is not just some optional embellishment.
  
  On the other hand, what I just wrote already suggests that even expressing the users' requests in a sufficiently flexible cross-platform way is going to be hard. Also, indexing can be a fairly slow operation, which means it will probably need some sort of UWS interface.
2. Other people's tables. It is conceivable that people might want to share their persistent tables with other users. If we want to enable that, one would need some interface on which to define who should be able to read (write?) what table, some other interface on which users can find what tables have been shared with them, and finally some way to let query writers reference these tables (tap_user.<username>.<tablename> seems tricky since with federated auth, user names may be just about anything).
  
  Given all this, for now I doubt that this is a use case sufficiently important to make all the tough nuts delay a first version of user uploads.
3. Deferring destruction. Right now, you can delete your table early, but you cannot tell my server that you would like to keep it for longer. I suppose POST-ing to a destruction child of the table resource in UWS style would be straightforward enough. But I'd rather wait whether the other lacunae require a completely different pattern before I will touch this; for now, I don't believe many persistent tables will remain in use beyond a few hours after their creation.
4. Scaling. Right now, I am not streaming the upload, and several other implementation details limit the size of realistic user tables. Making things more robust (and perhaps scalable) hence will certainly be an issue. Until then I hope that the sort of table that worked for in-request uploads will be fine for persistent uploads, too.
Implemented in DaCHS

If you run a DaCHS-based data centre, you can let your users play with the stuff I have shown here already. Just upgrade to the 2.10.2 beta (you will need to enable the beta repo for that to happen) and then type the magic words:
```
dachs imp //tap_user
```
It is my intention that users cannot create tables in your DaCHS database server unless you say these words. And once you say dachs drop --system //tap_user, you are safe from their huge tables again. I would consider any other behaviour a bug – of which there are probably still quite a few. Which is why I am particularly grateful to all DaCHS operators that try persistent uploads now.

[1] As already said in the notebook, if http bothers you, you can write https, too; but then it's much harder to watch what's going on using ngrep or friends.
Category: Standards
Learn To Use The VO

2024-08-14 Markus Demleitner

The first 60 pages of the lecture notes as they currently are. I give you a modern textbook would probably look a bit more colorful from this distance, but perhaps this will still do.

About ten years ago, I had planned to write something I tentatively called VadeVOcum: A guide for people wanting to use the Virtual Observatory somewhat more creatively than just following and slightly adapting tutorials and use cases. If you will, I had planned to write a textbook on the VO.

For all the usual reasons, that project never went far. Meanwhile, however, GAVO's courses on ADQL and on pyVO grew and matured. When, some time in 2021, I was asked whether I could give a semester-long course “on the VO”, I figured that would be a good opportunity to finally make the pyVO course publishable and complement the two short courses with enough framing that some coherent story would emerge, close enough to the VO textbook I had in mind in about 2012.
Teaching Virtual Observatory Matters

The result was a course I taught at Universität Heidelberg in the past summer semester together with Hendrik Heinl and Joachim Wambsganss. I have now published the lecture notes, which I hope are textbooky enough that they work for self-study, too. But of course I would be honoured if the material were used as a basis of similar courses in other places. To make this simpler, the sources are available on Codeberg without relevant legal restrictions (i.e., under CC0).

The course currently comprises thirteen “lectures”. These are designed so I can present them within something like 90 minutes, leaving a bit of space for questions, contingencies, and the side tracks. You can build the slides for each of these lectures separately (see the .pres files in the source repository), which makes the PDF to work while teaching less cumbersome. In addition to that main trail, there are seven “side tracks”, which cover more fundamental or more general topics.

In practice, I sprinkled in the side tracks when I had some time left. For instance, I showed the VOTable side track at the ends of the ADQL 2 and ADQL 3 lectures; but that really had no didactic reason, it was just about filling time. It seemed the students did not mind the topic switches to much. Still, I wonder if I should not bring at least some of the side tracks, like those on UCDs, identifiers, and vocabularies, into the main trail, as it would be unfortunate if their content fell through the cracks.

Here is a commented table of contents:
- Introduction: What is the VO and why should you care? (including a first demo)
- Simple Protocols and their clients (which is about SIAP, SSAP, and SCS, as well as about TOPCAT and Aladin)
- TAP and ADQL (that's typically three lectures going from the first SELECT to complex joins involving subqueries)
- Interlude: HEALPix, MOC, HiPS (this would probably be where a few of the other side tracks might land, too)
- pyVO Basics (using XService objects and a bit of SAMP, mainly along an image discovery task)
- pyVO and TAP (which is developed around a multi-catalogue SED building case)
- pyVO and the Registry (which, in contrast to the rest of the course, is employing Jupyter notebooks because much of the Registry API makes sense mainly in interactive use)
- Datalink (giving a few pyVO examples for doing interesting things with the protocol)
- Higher SAMP Magic (also introducing a bit of object oriented programming, this is mainly about tool building)
- At the Limit: VO-Wide TAP Queries (cross-server TAP queries with query building, feature sensing and all that jazz; I admit this is fairly scary and, well, at the limit of what you'd want to show publicly)
- Odds and Ends (other pyVO topics that don't warrant a full section)
- Side Track: Terminology (client, server, dataset, data collection, oh my; I had expected this to grow more than it actually did)
- Side Track: Architecture (a deeper look at why we bother with standards)
- Side Track: Standards (a very brief overview of what standards the IVOA has produced, with a view of guiding users away from the ones they should not bother with – and perhaps towards those they may want to read after all)
- Side Track: UCDs (including hints on how to figure out which would denote a concept one is interested in)
- Side Track: Vocabularies (I had some doubts whether that is too much detail, but while updating the course I realised that vocabularies are now really user-visible in several places)
- Side Track: VOTable (with the intention of giving people enough confidence to perform emergency surgery on VOTables)
- Side Track: IVOA Identifiers (trying to explain the various ivo:// URIs users might see).
Pitfalls: Technical, Intellectual, and Spiritual

The course was accompanied by lab work, again 90 minutes a week. There are a few dozen exercises embedded in the course, and in the lab sessions we worked on some suitable subset of those. With the particular students I had and the lack of grading pressure, the fact that solutions for most of the exercises come with the lecture notes did not turn out to be a problem.

The plan was that the students would explain their solutions and, more importantly, the places they got stuck in to their peers. This worked reasonably well in the ADQL part, somewhat less for the side tracks, and regrettably a lot less well in the pyVO part of the course. I cannot say I have clear lessons to be learned from that yet.

A piece of trouble for the student-generated parts I had not expected was that the projector only interoperated with rather few of the machines the students brought. Coupling computers and projectors was occasionally difficult even in the age of universal VGA. These days, even in the unlikely event one has an adapter for the connectors on the students' computers, there is no telling what part of a computer screen will end up on the wall, which distortions and artefacts will be present and how much the whole thing will flicker.

Oh, and better forget about trying to fix things by lowering the resolution or the refresh rate or whatever: I have not had one instance during the course in which any plausible action on the side of the computer improved the projected image. Welcome to the world of digital video signals. Next time around, I think I will bring a demonstration computer and figure out a way in which the students can quickly transfer their work there.

Talking about unexpected technical hurdles: I am employing PDF-attached source code quite extensively in the course, and it turned out that quite a few PDF clients in use no longer do something reasonable with that. With pdf.js, I see why that would be, and it's one extra reason to want to avoid it. But even desktop readers behaved erratically, including some Windows PDF reader that had the .py extension on some sort of blacklist and refused to store the attached files on grounds that they may “damage the computer”. Ah well. I was tempted to have a side track on version control with git when writing the course. This experience is probably an encouragement to follow through with that and at least for the pyVO part to tell students to pull the files out of a checkout of the course's source code.

Against the outline in the lecture as given, I have now promoted the former HEALPix side track to an interlude session, going between ADQL and pyVO. It logically fits there, and it was rather popular with the students. I have also moved the SAMP magic lecture to a later spot in the course; while I am still convinced it is a cool use case, and giving students a chance to get to like classes is worthwhile, too, it seems to be too much tool building to have much appeal to the average participant.

Expectably, when doing live VO work I regularly had interesting embarrassments. For instance, in the pyvo-tap lecture, where we do something like primitive SEDs from three catalogues (SDSS, 2MASS and WISE), the optical part of the SEDs was suddenly gone in the lecture and I really wondered what I had broken. After poking at things for longer than I should have, I eventually promised to debug after class and report next time, only to notice right after the lecture that I had, to make some now-forgotten point, changed the search position – and had simply left the SDSS footprint.

But I believe that was actually a good thing, because showing actual errors (it does not hurt if they are inadvertent) and at least brief attempts to understand them (and, possibly later, explain how one actually understood them) is a valuable part of any sort of (IT-related) education. Far too few people routinely attempt to understand what a computer is trying to tell them when it shows a message – at their peril.

Reruns, House Calls, TV Shows

Of course, there is a lot more one could say about the VO, even when mainly addressing users (as opposed to adopters). An obvious addition will be a lecture on the global dataset discovery API I have recently discussed here, and I plan to write it when the corresponding code will be in a pyVO release. I am also tempted to have something on stilts, perhaps in a side track. For instance, with a view to students going on to do tool development, in particular stilts' validators would deserve a few words.

That said, and although I still did quite a bit of editing based on my experiences while teaching, I believe the material is by and large sound and up-to-date now. As I said: everyone is welcome to the material for tinkering and adoption. Hendrik and I are also open to give standalone courses on ADQL (about a day) or pyVO (two to three days) at astronomical institutes in Germany or elsewhere in not-too remote Europe as long as you house (one of) us. The complete course could be a 10-days block, but I don't think I can be booked with that[1].

Another option would be a remote-teaching version of the course. Hendrik and I have discussed whether we have the inclination and the resources to make that happen, and if you believe something like that might fit into your curriculum, please also drop us a note.

And of course we welcome all sorts of bug reports and pull requests on codeberg, first and foremost from people using the material to spread the VO gospel.

[1] Well… let me hedge that I don't think I'd find a no in myself if the course took place on the Canary Islands…

Category: Demo
DASCH is now in the VO

2024-04-29 Markus Demleitner

This frame would show comet 2P/Encke during its proximity to Earth in 1941 – if it went deep enough. But never mind practicalities: If you want to learn about matching ephemeris against the DASCH plate collection (or, really, any sort of obscore-like table), read on.
For about a century – that is, into the 1980s –, being an observational astronomer meant taking photographic plates and doing tricks with them (unless you were a radio astronomer or one of the very few astronomers peeking beyond radio and optical in those days, of course). This actually is somewhat fortunate for archivists, because unlike many of the early CCD observations that by now are lost with our ability to read the tapes they were stored on, the plates are still there.

Why Bother?

However, to make them usable, the plates need to be digitised. In the GAVO data centre, we keep the results of several scan campaigns large and small, such as HDAP, the various data collections joined in the historical photographic plate image archive HPPA, or the delightfully quirky Münster Flare Plates.

I personally care a lot about these data collections. This is partly because they are indispensible for understanding the history of astronomy. But more importantly, they are the next best thing we have to a time machine; if we have a way of knowing how the sky looked like seventy years ago, it is these plate collections. Having such a time machine is important for all kinds of scientific efforts, including figuring out whether there are aliens (i.e., 2016ApJ...822L..34S) on Tabby's Star.

Somewhat to my chagrin, the cited paper 2016ApJ...822L..34S did not use the VO to obtain the plate images but went straight to DASCH's web interface. DASCH, in case you have not heard of it before, is probably the most ambitious project concerned with plate digitisation at the moment – or perhaps: “was”, because they just finished scanning the core part of Harvard's plate collections, which was their primary goal.

I can understand why Bradley Schaefer, the paper's author, did not bother with a VO search In 2016. For starters, working with halfway homogeneous data from instruments you are somewhat familiar saves a substantial amount of work and thought, in particular if you are, in addition, up against the usual lack of machine-readable metadata. Also, at that time DASCH probably had about as many digitised plates as all the VO's contemporary plate collections taken together.
DASCH: The Harvard Plates

Given such stats, I have always wanted to have at least the metadata from DASCH's plates in the VO. Thanks to a recent update to DASCH's publication system, this is now a reality. Since 2024-04-29, I am publishing the metadata of the DASCH plates via Obscore and and SIAP2.

Followup (2024-05-03)

This is now DASCH news, and one of my two main contacts on the DASCH side, Peter Williams, has written an insightful post on this, too. Let me use this opportunity to thank him for the delightful cooperation, and extend these thanks to Ben Sabath, who is primarily responsible for the update to the DASCH publication system I mentioned above.

Matching plates are returned as datalink documents, pointing to a preview, photos of the plate and its jacket, and links to the science data, once downsampled by a factor of 16, once in the original size (example). For now, #this points to the downsampled version, as Amazon charges DASCH about three cents per full-scale plate at the moment, and that can quickly add up by accident (there's nothing wrong with consciously downloading full-scale FITS-es if you need them, of course).

This is a bit fishy in that the size of the image in the obscore/SIAP2 fields s_xel1 and s_xel2 refers to the unscaled image, and thus I should be returning the full-scale image as datalink #this. I hope I will not cause much confusion with this design.

In case you look at the links in the datalink documents, let me include a disclaimer: Although they point into the GAVO data centre, the data is served courtesy of the DASCH project. The links only go to us because we need to sign links for you. I mention this because you can save the datalink documents and the links within them; the URLs you are redirected to from there, however, will expire fast. Just do not look at them.
Followup (2025-02-05)

As of today, we support cutouts of DASCH plates, too. This is a fairly basic service at this point, returning fixed-size cutouts only. However, for many use cases, these cutouts may be good enough.

For instance, here is how to retrieve cutouts for the vicinity of M51:
```
import pyvo

pos = (202.46, 47.19)

svc = pyvo.dal.TAPService("http://dc.g-vo.org/tap")
res = svc.run_sync(f"""
    SELECT *
    FROM
        dasch.narrow_plates
    WHERE
        distance(s_ra, s_dec, {pos[0]}, {pos[1]})<0.5""")
for rec in res:
    prod = rec.processed(circle=pos+(0.1,))
    dest_name = rec["dasch_id"]+".cutout.fits"
    print(dest_name)
    with open(dest_name, "wb") as f:
        f.write(prod.data)
```
And this is what M51 looked like in 1968 through the 12-inch Metcalf Doublet (DASCH id ma11561), displayed in ds9:
Plates in Global Discovery

So – what can you do with DASCH in the VO that you could not do before?

Most importantly, you will discover DASCH in registry interfaces and its datasets in global queries (in particular the global dataset queries I have discussed a few weeks ago). For instance, DASCH is now in Aladin's discovery tree:

You can now find DASCH in Aladin and do the usual “in view“ searches. However, currently this yields many matches that are, in practical terms, spurious, as they come from extremely wide-angle instruments. The red rectangle is the footprint of one of these images; note that the view here is a full two pi sky. We will probably do something about this “noise“.

The addition of DASCH to the VO has a strong effect in some use cases. For instance, at the end of the GAVO plates tutorial, we do an all-VO obscore query that, at the time of the last update of the tutorial in 2019, yielded 4067 datasets (of course, including modern and/or non-optical observations) potentially showing some strongly lensed quasar. With DASCH – and, admittedly, a few more collections that came into the VO since 2019 –, that number is now 10'489; the range of observation dates grew from MJD 12550…52000 to MJD 9800…58600, with the mean decreasing from 51'909 to 30'603. That the mean observation date moves that much back in time is a certain sign that a major part of the expansion is due to DASCH (well, and certainly to APPLAUSE, too).

Followup (2024-05-03)

As discussed in my DASCH update, I have taken out the large-coverage plates from my obscore table, which changes the stats (but not the conclusions) quite a bit. They is now 10'098 plates and mean observation date 36'396
TAP, Uploads, and pyVO on DASCH

But this is not just about bringing astronomical heritage to the VO. It is also about exposing DASCH through the powerful ADQL/TAP interface. As an example of how this may be useful, consider the comet P2/Encke, which, according to JPL's Small-Body Database was relatively close to Earth (about half an AU) in May 1941. It would have had about 14.5 mag at that point and hence was safely within reach of several of the instruments archived in DASCH. Perhaps we can find serendipitous or even targeted observations of the comet in the collection?

The plan to find that out is: compute an ephemeris (we are lazy and use an external service, Miriade ephemcc) and then for each day see whether there are DASCH observations in the vicinity of the sky location obtained in this way.

As usual, it's never that easy because the call to the ephemeris webservice (paste the link into TOPCAT to have a look) returns cursed sexagesimal coordinates. We need to fix them before doing anything serious with the table, and while we are at it, we also repair the date, which is simpler to consume if it is MJD to begin with. Getting the ephemeris thus takes quite a few lines:
```
from astropy import table
from astropy import units as u
from astropy.coordinates import SkyCoord
from astropy.time import Time

ephem = table.Table.read(
  "https://vo.imcce.fr/webservices/miriade/ephemcc.php?-from=vespa"
  "&-name=c:p/encke&-ep=1941-04-01&-nbd=90&-step=1d&-observer=500"
  &-mime=votable")

parsed = SkyCoord(ephem["ra"], ephem["dec"], unit=(u.hourangle, u.deg))
ephem["ra"] = parsed.ra.degree
ephem["dec"] = parsed.dec.degree

parsed = Time(ephem["epoch"])
ephem["epoch"] = parsed.mjd
```
Compared to that, the actual matching against DASCH is almost trivial if you are somewhat familiar with crossmatching in ADQL and the Obscore schema:
```
svc = pyvo.dal.TAPService("http://dc.g-vo.org/tap")
res = svc.run_sync("""
    SELECT *
    FROM
        dasch.plates
        JOIN tap_upload.orbit
        ON (1=CONTAINS(POINT(ra, dec), s_region))
    WHERE
        t_min<epoch
        AND t_max>epoch""",
    uploads={"orbit": ephem})
```
Followup (2024-05-03)

You would probably query the dasch.narrow_plates table in actual operations; querying dasch.plates is probably more for people interested in the history of astronomy or DASCH itself.

Inspect the query for a moment: This is a normal upload join, except we are constructing an ADQL POINT on the fly to be able to see whether we are in the spatial region covered by a DASCH dataset (given in obscore's s_region column). We could have put the temporal condition into the join's ON; but I think the intention is somewhat clearer with the WHERE constraint, and the database engine will probably go through identical motions for both queries – the beauty of having a query planner in the loop is that you do not need to think about such details most of the time.

Actually, in this case there is one last complication: As said above, we have put a datalink service between you and the downloads to discourage accidental large downloads. We hence use pyVO's (suboptimally documented) datalink interface (iter_datalinks):
```
with pyvo.samp.connection() as conn:
    for dl in res.iter_datalinks():
        link = next(dl.bysemantics("#preview-image"))
        pyvo.samp.send_image_to(
            conn,
            link.access_url,
            client_name="Aladin")
```
Among the artefacts available we pick the scaled jpegs in this fragment (#preview-image), since these are almost free even on the Amazon cloud. Change that #preview-image to #this in the to get scaled calibrated FITS-es, which are still fairly small. This would, for instance, let you overplot the ephemeris in Aladin, which you cannot do with the jpegs as they lack astrometric calibration (for now). But even with #preview-image, we can use Aladin as a glorified image viewer by SAMP-sending the images there, which is why we do the minor magic with functions from pyvo.samp.

If you want to try this yourself or mangle the program to do something else that requires querying against a reasonable number positions in time and space, just get encke.py and hack away. Make sure to start Aladin before running the program so it has something to send the images to.
Disclaimers

This is a contrived example, and it is likely that this particular use case is astronomically wrong in several ways. Let me enumerate a few things that would need looking into before this approaches proper science:
- We compute the ephemeris for the center of the Earth. At half an AU distance, the resulting parallax will not shift the position enough to hide a plate we should know about, but at least for anything closer, you should try to do a bit better; admittedly, for a resource like DASCH – that contains plates from observatories all over the place – you will have to compromise.
- The ephemeris is probably wrong; comet's orbits change over time, and I have no idea if the ephemeris service actually uses 2P/Encke's 1941 orbit to compute the positions.
- The coordinate metadata may be wrong. Ephemcc's documentation says something that sounds a lot as if they were sometimes returning RA and Dec for the equator of the time rather than for J2000 (i.e., ICRS for all intents and purposes), but of course our obscore coverages are for the ICRS. Regrettably, the VOTable returned by the service does not contain a COOSYS element yet, and so there is no easy way to tell.
- If you look at the table with DASCH matches, you will see they all were observed with an extremely wide-angle instrument sporting an aperture of a mere three inches. Even at the whopping exposure times (two hours), there is probably no way you would see a diffuse object of 14th mag on a plate with a 1940s-era photographic emulsion with that kind of optics (well: feel free to prove me wrong).
- It would of course be a huge waste of bandwidth to pull the entire plates if we already had a good idea of where we would expect the comet (i.e., had a reliable ephemeris). Hence, a cutout service that would let you retrieve more or less exactly the pixels you would like to use for your research and not the cruft around it would be a nifty supplement. It's in the works, and I'd say you can almost hold your breath. The cutout will simply appear as a SODA service in the datalink documents. See 2020ASPC..522..295D for how you would operate such a service.
Category: Data
Updates to GAVO's Tutorials

2023-03-23 Markus Demleitner

Over the years, GAVO has produced a number of VO tutorials, i.e., texts that introduce some technique related to using the Virtual Observatory, preferably within some halfway plausible scenario. In effect, they are software documentation, and as software itself, software documentation suffers from bit rot. To work against that, the tutorials have to be revised occasionally.

My two student assistants Sonja Gabriel and Chuanming Mao have recently done some of that revising. Let me use this opportunity to show off some of these freshly polished tutorials.

A classic one (that has, if I may say so myself, aged rather well), is Adding catalog data to object lists using the VO. This is a thinly disguised introduction to TAP uploads, arguably the most powerful of all the VO tech to date. If you have come to this place without ever having done a TAP upload, you owe it to yourself to at least skim the tutorial and quickly follow along the few steps to do positional crossmatches with just about any astronomical catalog and with just about any level of sophistication.

Another classic – it has its roots in the original Italian VO Days[1] – is TOPCAT and Aladin working together. It is using SDSS data of some galaxy cluster to try and get you to to send around data and positions between different programs using SAMP. If you are reading VO blogs, it is not unlikely this kind of thing will make you yawn. But at VO Days, it's little things like this that usually most immediately appeal to students and researchers alike.

From a tech point of view, Explore the Pleiades with TOPCAT and Aladin also mainly looks at SAMP (perhaps even somewhat less convincingly), but it's such a striking demo of what an amazing instrument Gaia is, and it's a nice introduction to TOPCAT's VO interface and subsetting facility that it's definitely worth a look, in particular as a showcase of having instant results with the VO.

An entirely different topic (well: it also employs SAMP for a moment) is covered by Data Discovery Using the Virtual Observatory Registry. This is trying to motivate looking for data collections in the VO Registry (in the form of our Browser interface to it). This tutorial has grown quite a bit during the review and now includes two sections joining data from different resources for various purposes. One section illustrates how systematics of quasar redshifts might be looked into using different sources, the other investigates the Tully-Fisher relationship in different spectral bands.

The tutorial on Asteroids in the Solar System was entriely overhauled. It was (and still is) mainly intended to be used in schools, and thus it originally just built on things that ran in a web browser. As is typical of things in web browsers, they have long since vanished. Hence, a rather fundamental update was necessary anyway. While we were looking for interesting things to do – the plot above, by the way, is the distribution of major halfaxes in the Kuiper belt –, we ended up even includeding a brief bit on ADQL.

Due to its school focus, we are also offering this particular text in German as well as in English. If you are an Astronomy teacher with particularly motivated pupils , we would like to hear from you…

The last revised tutorial I would like to mention also has a somewhat special (main) target audience: Astrometric Calibration using Aladin. Admittedly, automatic, or “blind” calibration has become really great, and I think getting their images located on the sky is not much of a problem even for amateurs any more, thanks in part to services like astrometry.net. But then – sometimes there is nothing like a good, old manual, ummm, “plate” solution. Aladin and the VO make that lot less tedious than it used to be.

Of course, I cannot have a post on tutorials without mentioning the VO Text Treasures, a web page that shows the educational material currently registered in the VO Registry. This little page also accounts for bit rot: You can sort by the time last inspected there, and thanks to Sonja's and Chuanming's efforts, our tutorials look very good in that representation at the moment.

In case you have some material suitable for WIRR yourself: Please register it, too. Send me a mail and I will lend you a hand (or, if you are a VO pro, directly read the pertinent standard).

[1] That's block courses on VO matters lasting a day or two. If you are in Germany, you can book us for your very own one!

Category: Demo