Posts with the Tag TAP:

DASCH is now in the VO

2024-04-29 Markus Demleitner

This frame would show comet 2P/Encke during its proximity to Earth in 1941 – if it went deep enough. But never mind practicalities: If you want to learn about matching ephemeris against the DASCH plate collection (or, really, any sort of obscore-like table), read on.
For about a century – that is, into the 1980s –, being an observational astronomer meant taking photographic plates and doing tricks with them (unless you were a radio astronomer or one of the very few astronomers peeking beyond radio and optical in those days, of course). This actually is somewhat fortunate for archivists, because unlike many of the early CCD observations that by now are lost with our ability to read the tapes they were stored on, the plates are still there.

Why Bother?

However, to make them usable, the plates need to be digitised. In the GAVO data centre, we keep the results of several scan campaigns large and small, such as HDAP, the various data collections joined in the historical photographic plate image archive HPPA, or the delightfully quirky Münster Flare Plates.

I personally care a lot about these data collections. This is partly because they are indispensible for understanding the history of astronomy. But more importantly, they are the next best thing we have to a time machine; if we have a way of knowing how the sky looked like seventy years ago, it is these plate collections. Having such a time machine is important for all kinds of scientific efforts, including figuring out whether there are aliens (i.e., 2016ApJ...822L..34S) on Tabby's Star.

Somewhat to my chagrin, the cited paper 2016ApJ...822L..34S did not use the VO to obtain the plate images but went straight to DASCH's web interface. DASCH, in case you have not heard of it before, is probably the most ambitious project concerned with plate digitisation at the moment – or perhaps: “was”, because they just finished scanning the core part of Harvard's plate collections, which was their primary goal.

I can understand why Bradley Schaefer, the paper's author, did not bother with a VO search In 2016. For starters, working with halfway homogeneous data from instruments you are somewhat familiar saves a substantial amount of work and thought, in particular if you are, in addition, up against the usual lack of machine-readable metadata. Also, at that time DASCH probably had about as many digitised plates as all the VO's contemporary plate collections taken together.
DASCH: The Harvard Plates

Given such stats, I have always wanted to have at least the metadata from DASCH's plates in the VO. Thanks to a recent update to DASCH's publication system, this is now a reality. Since 2024-04-29, I am publishing the metadata of the DASCH plates via Obscore and and SIAP2.

Followup (2024-05-03)

This is now DASCH news, and one of my two main contacts on the DASCH side, Peter Williams, has written an insightful post on this, too. Let me use this opportunity to thank him for the delightful cooperation, and extend these thanks to Ben Sabath, who is primarily responsible for the update to the DASCH publication system I mentioned above.

Matching plates are returned as datalink documents, pointing to a preview, photos of the plate and its jacket, and links to the science data, once downsampled by a factor of 16, once in the original size (example). For now, #this points to the downsampled version, as Amazon charges DASCH about three cents per full-scale plate at the moment, and that can quickly add up by accident (there's nothing wrong with consciously downloading full-scale FITS-es if you need them, of course).

This is a bit fishy in that the size of the image in the obscore/SIAP2 fields s_xel1 and s_xel2 refers to the unscaled image, and thus I should be returning the full-scale image as datalink #this. I hope I will not cause much confusion with this design.

In case you look at the links in the datalink documents, let me include a disclaimer: Although they point into the GAVO data centre, the data is served courtesy of the DASCH project. The links only go to us because we need to sign links for you. I mention this because you can save the datalink documents and the links within them; the URLs you are redirected to from there, however, will expire fast. Just do not look at them.
Followup (2025-02-05)

As of today, we support cutouts of DASCH plates, too. This is a fairly basic service at this point, returning fixed-size cutouts only. However, for many use cases, these cutouts may be good enough.

For instance, here is how to retrieve cutouts for the vicinity of M51:
```
import pyvo

pos = (202.46, 47.19)

svc = pyvo.dal.TAPService("http://dc.g-vo.org/tap")
res = svc.run_sync(f"""
    SELECT *
    FROM
        dasch.narrow_plates
    WHERE
        distance(s_ra, s_dec, {pos[0]}, {pos[1]})<0.5""")
for rec in res:
    prod = rec.processed(circle=pos+(0.1,))
    dest_name = rec["dasch_id"]+".cutout.fits"
    print(dest_name)
    with open(dest_name, "wb") as f:
        f.write(prod.data)
```
And this is what M51 looked like in 1968 through the 12-inch Metcalf Doublet (DASCH id ma11561), displayed in ds9:
Plates in Global Discovery

So – what can you do with DASCH in the VO that you could not do before?

Most importantly, you will discover DASCH in registry interfaces and its datasets in global queries (in particular the global dataset queries I have discussed a few weeks ago). For instance, DASCH is now in Aladin's discovery tree:

You can now find DASCH in Aladin and do the usual “in view“ searches. However, currently this yields many matches that are, in practical terms, spurious, as they come from extremely wide-angle instruments. The red rectangle is the footprint of one of these images; note that the view here is a full two pi sky. We will probably do something about this “noise“.

The addition of DASCH to the VO has a strong effect in some use cases. For instance, at the end of the GAVO plates tutorial, we do an all-VO obscore query that, at the time of the last update of the tutorial in 2019, yielded 4067 datasets (of course, including modern and/or non-optical observations) potentially showing some strongly lensed quasar. With DASCH – and, admittedly, a few more collections that came into the VO since 2019 –, that number is now 10'489; the range of observation dates grew from MJD 12550…52000 to MJD 9800…58600, with the mean decreasing from 51'909 to 30'603. That the mean observation date moves that much back in time is a certain sign that a major part of the expansion is due to DASCH (well, and certainly to APPLAUSE, too).

Followup (2024-05-03)

As discussed in my DASCH update, I have taken out the large-coverage plates from my obscore table, which changes the stats (but not the conclusions) quite a bit. They is now 10'098 plates and mean observation date 36'396
TAP, Uploads, and pyVO on DASCH

But this is not just about bringing astronomical heritage to the VO. It is also about exposing DASCH through the powerful ADQL/TAP interface. As an example of how this may be useful, consider the comet P2/Encke, which, according to JPL's Small-Body Database was relatively close to Earth (about half an AU) in May 1941. It would have had about 14.5 mag at that point and hence was safely within reach of several of the instruments archived in DASCH. Perhaps we can find serendipitous or even targeted observations of the comet in the collection?

The plan to find that out is: compute an ephemeris (we are lazy and use an external service, Miriade ephemcc) and then for each day see whether there are DASCH observations in the vicinity of the sky location obtained in this way.

As usual, it's never that easy because the call to the ephemeris webservice (paste the link into TOPCAT to have a look) returns cursed sexagesimal coordinates. We need to fix them before doing anything serious with the table, and while we are at it, we also repair the date, which is simpler to consume if it is MJD to begin with. Getting the ephemeris thus takes quite a few lines:
```
from astropy import table
from astropy import units as u
from astropy.coordinates import SkyCoord
from astropy.time import Time

ephem = table.Table.read(
  "https://vo.imcce.fr/webservices/miriade/ephemcc.php?-from=vespa"
  "&-name=c:p/encke&-ep=1941-04-01&-nbd=90&-step=1d&-observer=500"
  &-mime=votable")

parsed = SkyCoord(ephem["ra"], ephem["dec"], unit=(u.hourangle, u.deg))
ephem["ra"] = parsed.ra.degree
ephem["dec"] = parsed.dec.degree

parsed = Time(ephem["epoch"])
ephem["epoch"] = parsed.mjd
```
Compared to that, the actual matching against DASCH is almost trivial if you are somewhat familiar with crossmatching in ADQL and the Obscore schema:
```
svc = pyvo.dal.TAPService("http://dc.g-vo.org/tap")
res = svc.run_sync("""
    SELECT *
    FROM
        dasch.plates
        JOIN tap_upload.orbit
        ON (1=CONTAINS(POINT(ra, dec), s_region))
    WHERE
        t_min<epoch
        AND t_max>epoch""",
    uploads={"orbit": ephem})
```
Followup (2024-05-03)

You would probably query the dasch.narrow_plates table in actual operations; querying dasch.plates is probably more for people interested in the history of astronomy or DASCH itself.

Inspect the query for a moment: This is a normal upload join, except we are constructing an ADQL POINT on the fly to be able to see whether we are in the spatial region covered by a DASCH dataset (given in obscore's s_region column). We could have put the temporal condition into the join's ON; but I think the intention is somewhat clearer with the WHERE constraint, and the database engine will probably go through identical motions for both queries – the beauty of having a query planner in the loop is that you do not need to think about such details most of the time.

Actually, in this case there is one last complication: As said above, we have put a datalink service between you and the downloads to discourage accidental large downloads. We hence use pyVO's (suboptimally documented) datalink interface (iter_datalinks):
```
with pyvo.samp.connection() as conn:
    for dl in res.iter_datalinks():
        link = next(dl.bysemantics("#preview-image"))
        pyvo.samp.send_image_to(
            conn,
            link.access_url,
            client_name="Aladin")
```
Among the artefacts available we pick the scaled jpegs in this fragment (#preview-image), since these are almost free even on the Amazon cloud. Change that #preview-image to #this in the to get scaled calibrated FITS-es, which are still fairly small. This would, for instance, let you overplot the ephemeris in Aladin, which you cannot do with the jpegs as they lack astrometric calibration (for now). But even with #preview-image, we can use Aladin as a glorified image viewer by SAMP-sending the images there, which is why we do the minor magic with functions from pyvo.samp.

If you want to try this yourself or mangle the program to do something else that requires querying against a reasonable number positions in time and space, just get encke.py and hack away. Make sure to start Aladin before running the program so it has something to send the images to.
Disclaimers

This is a contrived example, and it is likely that this particular use case is astronomically wrong in several ways. Let me enumerate a few things that would need looking into before this approaches proper science:
- We compute the ephemeris for the center of the Earth. At half an AU distance, the resulting parallax will not shift the position enough to hide a plate we should know about, but at least for anything closer, you should try to do a bit better; admittedly, for a resource like DASCH – that contains plates from observatories all over the place – you will have to compromise.
- The ephemeris is probably wrong; comet's orbits change over time, and I have no idea if the ephemeris service actually uses 2P/Encke's 1941 orbit to compute the positions.
- The coordinate metadata may be wrong. Ephemcc's documentation says something that sounds a lot as if they were sometimes returning RA and Dec for the equator of the time rather than for J2000 (i.e., ICRS for all intents and purposes), but of course our obscore coverages are for the ICRS. Regrettably, the VOTable returned by the service does not contain a COOSYS element yet, and so there is no easy way to tell.
- If you look at the table with DASCH matches, you will see they all were observed with an extremely wide-angle instrument sporting an aperture of a mere three inches. Even at the whopping exposure times (two hours), there is probably no way you would see a diffuse object of 14th mag on a plate with a 1940s-era photographic emulsion with that kind of optics (well: feel free to prove me wrong).
- It would of course be a huge waste of bandwidth to pull the entire plates if we already had a good idea of where we would expect the comet (i.e., had a reliable ephemeris). Hence, a cutout service that would let you retrieve more or less exactly the pixels you would like to use for your research and not the cruft around it would be a nifty supplement. It's in the works, and I'd say you can almost hold your breath. The cutout will simply appear as a SODA service in the datalink documents. See 2020ASPC..522..295D for how you would operate such a service.
Category: Data

Computing Residuals of an Astrometric Calibration

2022-11-29 Markus Demleitner

Two plots, left a fairly good correlation, right a cloudy wave

The kind of plot you can make following the recipe given here: Left, a comparison of the photometry, right, a positional residuals, not taking into account the SIP plate solution, when comparing the HDAP plate B3261a against Gaia DR3. Note that the cut-off a 4 arcsec is because of the match radius when obtaining the calibrator stars.

I recently had to assess the quality of the astrometric calibration of a photographic plate. What I am going to show you in this post will of course work just as well for CCD frames, and if these have a sufficiently large field of view, this may be an issue for them as well. However, the sort of data that needs this assessment most typically are scans of plates, as these tend to have a “wobble”, systematic offsets in the scan direction resulting from imperfections in the mechanics.

Prerequisites: An astronomical frame with a calibration in ICRS (or some frame not very far from it), called my-image.fits in the following, SExtractor (in Debian and derivatives: apt install source-extractor – long live Debian Astro; since it's called source-extractor in Debian, that's what I'll use here, too), and of course TOPCAT.

Step 1: Extract Sources. Source extraction is of course a high science, and if you know better than me, by all means do it the way you think is appropriate. Meanwhile, the following might very well work for you sufficiently well.

Create a working directory and enter it. Then, to create a file telling source-extractor what columns you would like to see, write the following to a file default.param:

ALPHA_SKY
DELTA_SKY
X_IMAGE
Y_IMAGE
MAG_ISO
FLUX_AUTO
ELONGATION

Next, give a few parameters to source-extractor; depending on the sort of image you have, you may want to play around with DETECT_MINAREA (how many pixels need to show a signal to register as a source) and DETECT_THRESH (how many sigmas a pixel has to be above the background to register as a candidate for belonging to a source). Meanwhile, write the following into a file default.control:

CATALOG_TYPE     FITS_1.0
CATALOG_NAME     img.axy
PARAMETERS_NAME  default.param
FILTER           N
DETECT_MINAREA   30
DETECT_THRESH    4
SEEING_FWHM      1.2

– but if the following call gives you a few hundred sources, that ought to work for the present purpose.

Then run:

source-extractor -c default.control my-image.fits

This will give you a catalogue of extracted objects in the file img.axy.

Step 2: Fix source-extractor's output. Load that img.axy into TOPCAT. Regrettably, source-extractor does not add any useful metadata to the columns of its output table. To add the absolute bare minimum, in TOPCAT go to Views → Column Info. In that window, check UCD in the Display menu, and then put pos.eq.ra and pos.eq.dec into the UCD fields of the ALPHA_SKY and DELTA_SKY columns, respectively; double click to change fields in TOPCAT.

To see if you have done the annotation right, in TOPCAT's main window, click Graphics → Sky Plot. If the objects show up, you have just provided enough annotation to let TOPCAT figure out the position for each row.

Step 3: Get calibrators. We will now try to add counterparts for Gaia DR3 to the extracted sources. To do that, click VO → Table Access Protocol, and in the window popping up double click the entry for the GAVO DC TAP.

In the Find box, type dr3lite to look for this site's version of the Gaia DR3 source catalogue. Click on gaia.dr3lite to select that table, and then select the Columns pane. This should show some of the Gaia DR3 columns.

Now Examples → Upload Join will generate a query that will cross-match your extracted sources with the Gaia sources. You should edit it a bit, only selecting the columns you will actually need, removing the TOP 1000 (at least on large images with more than 1000 sources), and reducing the match radius a bit when the calibration is not actually completely off and your epoch is sufficiently close to J2000.

Hint: you can control-click in the Columns pane and then use the Cols button to insert all the column names in one go[1]. For me, the resulting query would be:

SELECT
   source_id, ra, dec, phot_bp_mean_mag,
   tc.*
   FROM gaia.dr3lite AS db
   JOIN TAP_UPLOAD.t1 AS tc
   ON 1=CONTAINS(POINT('ICRS', db.ra, db.dec),
                 CIRCLE('ICRS', tc.ALPHA_SKY, tc.DELTA_SKY, 4./3600.))

This should result in about as many matches as your extraction had – a few more is ok, because you will have some spurious matches, a few less is ok, too, as there are always some outliers and artefacts, but you should clearly not pull a magnitude more or less objects here than you put in; fiddle with the match radius as necessary.

See if there is a rough correlation between the Gaia calibrators and your extracted sources by plotting phot_bp_mean_mag against MAG_ISO. Absent more information, MAG_ISO, source-extractor's guess for the magnitude of the extracted object, will be just some crazy number, but it should have some discernable correlation with the actual magnitude. Do not expect too much here, in particular with old plates, for which good photometry is a science of their own.

In my example, this looked like this:

Plot: a rough correlation in red with a green tail

The green points certainly are spurious matches; this observation did not reach beyond 14th magnitude or so, and there are many weak stars on the sky, so a few of them will show up in just about any cross match. See the opening picture for an example with a better correlation.

Step 4: Do the correlation plot. Do Graphics → Plane Plot and then plot ra-alpha_sky or dec-delta_sky against X_IMAGE or Y_IMAGE. You could get something like this:

This rather certainly reflects some optical distortion; source-extractor regrettably does not take into account SIP corrections yet, so it is likely that a large part of this would be taken care of by the polynomials of the plate solution (the github issue I am linking to tells you how to be sure).

But it can also look like this:

This certainly is not the result of a lens or anything optical at all. It's the scanner's gears that you are looking at here. With an amplitude of perhaps three arcseconds this is rather excessive here; but something like this you will rather likely see even on good scanners – though it may essentially be invisible, as of the Heidelberg scanner we used for HDAP:

Plot: A vertical cloud with no discernible structure.

[1]	I'm using the BP magnitude in the query below as most historical plates tend to be “blue sensitive“ (in some sense). Hence, BP magnitudes should be a bit closer to what source-extractor has extracted.

Gaia DR3 XP Spectra: All Sampled

2022-09-06 Markus Demleitner

Lots of blue crosses and a few red squares plotted over a sky photograph of a star cluster

Around this time of the year on the northern hemisphere, you can spot the h and χ Persei double star cluster with the naked eye. One part of it, NGC 884 is shown here with LAMOST DR6 low resolution spectra (red squares) and Gaia DR3 XP spectra (blue crosses) overplotted. Given that LAMOST has already been one of the largest collections of spectra on the planet, you can see that there is really a lot of those XP spectra.

When Gaia DR3 was released in June, I was somewhat disappointed when I realised what it is that they delivered as the BP/RP (or XP for short) spectra. You see, I had expected to see something rather similar to what I have in DFBS: structurally, arrays of a few dozen spectral points, mapping wavelengths to some sort of measure of the flux.

What really came were, mainly, “continuous spectra“, that is coefficients of Gauss-Hermite polynomials. You can fetch them from the gaiadr3.xp_continuous_mean_spectrum table at the ARI-Gaia TAP service; the blue part of the spectrum of the star DR3 4295806720 looks like this in there:

102.93398893929992, -12.336921213781045, -2.668856168170544, -0.12631176306793765, -0.9347021092539146, 0.05636787290132809, [...]

No common spectral client can plot this. The Gaia DPAC has helpfully provided a Python library called GaiaXPy to turn these into “proper” spectra. Shortly after the data release, my plan has thus been to turn all these spectra into their “sampled” form using GaiaXPy and then re-publish them, both through SSAP for ad-hoc discovery and through TAP for (potentially) global analysis.

Alas, for objects too faint to make it into DR3's xp_sampled_mean_spectrum table (that's 35 million spectra already turned to wavelength-flux pairs by DPAC), the spectra generated in this way looked fairly awful, with lots of very artificial-looking wiggles (“ringing”, if you will). After a bit of deliberation, I realised that when the errors are given on the Hermite coefficients, once you compute the samples, these errors will be liberally distributed among the output samples. In other words, the error on the samples will be grossly correlated over arbitrary distances; at least I am fairly helpless when trying to separate signal from artefact in these beasts.

Bummer. Well, fortunately, Rene Andrae from “up the mountain” (i.e., the MPI for Astronomy) has worked out a reasonably elegant way to get more conventional spectra understandable to mere humans. Basically, you compute n distinct “realisations” of the error model given by the table of the continuous spectra and average over them. The more samples you take, the less correlated your spectral points and their errors will be and the less confusing the signal will be. The service docs for gaia/s3 give the math.

Doing this on more than 200 million spectra is quite an effort, though, and so after some experimentation I decided to settle on 10 realisations per spectrum and have relatively wide bins (10 nm) over just the optical part of the spectrum (400 through 800 nm). The BP and RP bandpaths are a bit wider, and there is probably signal blotted out by the wide bins; I will probably be addressing this for DR4, except if these spectra become the smash hit they deserve to be.

The result of this procedure is now available through an SSAP service that should show up in the VO Registry by the time the first of you read this; the Aladin image above gives you an impression of the density of results here – and don't forget: the spectra with the blue crosses are all reasonably well flux-calibrated.

The data is also available on the TAP service http://dc.g-vo.org/tap, which opens up many interesting possibilities. Let me mention two here.

Comparison with LAMOST

I was rather nervous whether what I had done resulted in anything that bore even a fleeting resemblance to reality, and so about the first thing I tried was to compare my new data with what LAMOST has.

That is a nice exercise for TAP and ADQL. Let's first match spectra from the two surveys, which luckily are on the same server, saving us some cross-server uploads. I am selecting a minimum of data, just the position and the two access URLs, and I let DaCHS' MAXREC kick in so I'm just retrieving 20000 of the millions of result records:

SELECT a.ssa_location, a.accref, b.accref
FROM
  gdr3spec.ssameta AS a
  JOIN lamost6.ssa_lrs AS b
  ON DISTANCE(a.ssa_location, b.ssa_location)<0.001

(this is using the DISTANCE(.,.)<radius idiom that we will be migrating towards in ADQL 2.1 instead of the dreaded 1=CONTAINS(POINT, CIRCLE) thing everyone has loathed in ADQL 2.0).

Using the nifty activation actions, you can now tell TOPCAT to open the two spectra next to each other when you click on a row or a point in a sky plot. To reproduce,

Make a sky plot. TOPCAT doesn't yet pick up the POINT in ssa_location, so you have to configure the Lon and Lat fields yourself to ssa_location[0] and ssa_location[1].
Open the activation actions, either from the button bar or from the Views menu.
In there, select Plot Table, make sure it says accref in Table Location and then check Plot Table in the Actions pane. When you now click on a point in the sky plot, you should see a spectrum pop up, except it is plotted with dots, which most people consider inappropriate for spectra. Use the Form tab in the plot window to style it a bit more spectrum-like (I recommend looking into Line and XYError).
But how do you now add the LAMOST plot? I don't think TOPCAT's activation actions let you plot right into the plane plot you just configured. But you can add a second Plot Table action from the Actions menu in the window with the activation actions. As before, configure this new item, except this one needs to plot accref_ (which is what DaCHS has called the access reference for LAMOST to keep the names unique).
As for Gaia, configure to plot to look good as a spectrum. In order to make the two spectra optically comparable, under Axes set the range to 4000 to 8000 Angstrom manually here.

You can now click on points in your sky plot and, after a second or so, see the corresponding spectra next to each other (if you place the two plot windows that way).

If you try this, you will (hopefully) see that major features of spectra are nicely reproduced, such as with these, I guess, molecular bands:

Two line plots next to each other, the right one showing more features. the left one roughly follows the major wiggles, though.

As you probably have guessed, the extremely low-resolution Gaia XP spectrum is left, LAMOST's (somewhat higher-resolution) low-resolution spectrum is right:

This also works with absorption in the blue, as in this example:

Two line plots next to each other, the right one showing a lot of relatively sharp absoprtion lines, which the left one does not have. A few major bumps are present in both, and the general shape conincides nicely, expect perhaps at the blue edge.

In case of doubt, I have to say I'd probably trust Gaia's calibration around 400 nm better than LAMOST's. But that's mere guesswork.

For fainter objects, you will see remnants of the systematic wiggles from the Hermite polynomials:

Two line plots next to each other. Both are relatively noisy, in particular on the blue edge. The left one also seems to have a rather regular oscillation at the blue edge.

Anyway, if you keep an eye on the errors, you can probably even work with spectra from the fainter objects:

Two line plots next to each other. The left one has fairly strong ringing which is not present in the right one, but it mainly stays within the error bars. The total flux of this star is at least a factor of 10 less than for the prettier examples above.

Mass Retrieval of Spectra

One nice thing about the short spectra is that you can fetch many of them in one go and in very little time. For instance, to retrieve particularly red objects from the Gaia catalogue of Nearby Stars (also on the GAVO server) with spectra, say:

SELECT
  source_id, ra, dec, parallax, phot_g_mean_mag,
  phot_bp_mean_mag, phot_rp_mean_mag, ruwe, adoptedrv,
  flux, flux_error
FROM gcns.main
JOIN gdr3spec.spectra
USING (source_id)
WHERE phot_rp_mean_mag<phot_bp_mean_mag-4

[in case you wonder how I quickly got the column names enumerated here: do control-clicks into the Columns pane in TOCPAT's TAP window and then use the Cols button]. For when you do not have Gaia DR3 source_id-s in your source table, there is also gdr3spec.withpos against which you can do more conventional positional crossmatches.

Within a few seconds, you can retrieve more than 4000 spectra in this way. You can now do whatever analysis you want on these spectra. Or, well, just plot them. The following procedure for that later task uses TOPCAT features only available in the next release, due before mid-October[1].

First, make a colour-magnitude diagram (CMD) from this table as usual (e.g., BP-RP vs G). Then, open another plane plot and

Layers → Add XYArray Control
Configure the XYArray to plot from the table you just fetched, have nothing in X Values[2] and flux in Y Values.
Under Axes, configure Y Log in order to better show the 4253 spectra at one time.
Throw away or at least uncheck all other layers in the plot.
In order to let TOPCAT highlight the spectrum of the activated source, in the Subsets pane check the Activated subset (that's the bleeding-edge functionality you will not have in older TOPCATs) and give it a sufficiently bright colour.

With that, you can now click around in your CMD and immediately see that source's spectrum in the context of all the others, like this:

An animation of someone selecting various points in a CMD and have simulataneous spectra plotted.

These spectra have also inspired me to design and implement a vector extension for ADQL, which lets you do even more interesting things with these spectra. More on this… soon.

[1]	The Activated subset is only available in TOPCAT versions later than 4.8-7 (released in October 2022).

[2] These should be the spectral points; DaCHS does not deliver them with this query because I am a coward. I think I will find my courage relatively soon and then fix this. Once that has happened, you can select param$spectral as X values. [Update: Mark Taylor remarks that by writing sequence(41, 400, 10) in bleeding-edge TOPCATs and add(multiply(10,sequence(41)),400) before that, you can add a proper spectral axis until then]

Find a Dust-Free Window Using ADQL

2022-08-19 Markus Demleitner

Five of the seven patches of the sky that Bayestar 17 considers least obscured by dust in Aladin's WISE color HiPSes. There clearly is a pattern here. This post is about how you'll find these (and the credible ones, too).

The upcoming AG-Tagung in Bremen will have another puzzler, and while concocting the problem I needed to find a spot on the sky where there is very little interstellar extinction. What looks like a quick query turned out to require a few ADQL tricks that I thought I might show in this little post; they will come in handy in many situations.

First, I needed to find data on where on the sky there is dust. Had I not known about the extinction maps I've blogged about in 2018, I would probably have looked for extinction maps in the Registry, which might have led me to the Bayestar 17 map on my service eventually, too. The way it was, I immediately fired up TOPCAT and pointed it to the TAP service at http://dc.g-vo.org/tap (the “GAVO DC TAP“ of the TAP service list) and went to the column metadata of the prdust.map_union table.

Browsing the descriptions, the relevant columns here are healpix (which will give me the position) and best_fit. That latter thing is an array of reddening E(B − V) (i.e., higher values mean more dust) per distance bin, where the bins are 0.5 mag of distance modulus wide. I decided I'd settle for bin 20, corresponding to a kiloparsec. Dust further away than that will not trouble me much in the puzzler.

Finding the healpixes in the rows with the smallest best_fit[20] should be easy; it is a minor variant of a classic from the ADQL course:
```
SELECT TOP 20 healpix
FROM prdust.map_union
ORDER BY best_fit[20] ASC
```
Except that my box replies with an error message reading “Expected end of text, found '[' (at char 61), (line:3, col:18)”.

Hu? Well… if you look, then the problem is where I ask to sort by an array element. And indeed, it turns out that DaCHS, the software driving this site, will not let you sort by array elements yet. This is arguably a bug, and in all likelihood I will have fixed it by the time your read this. But there is a technique to defeat this and similar cases that every astronomer should know about: subqueries, which turn any query into something you can work with as if it were a table. In this case:
```
SELECT TOP 30 healpix, extinction
FROM (
  SELECT healpix, best_fit[20] as extinction
  FROM prdust.map_union) AS q
ORDER BY extinction ASC
```
– the “AS q“ gives the name of the “virtual” table resulting from the query a name. It is mandatory here. Do not be tempted to leave out the “AS” – that that is even legal is one of the major blunders of the SQL standard.

The result is looking good:
```
# healpix extinction
1021402 0.00479
1021403 0.0068
418619  0.00707
...
```
– so, we have the healpixes for which the extinction works out to be minimal. It is also reassuring that the two healpixes with the clearest sky (by this metric) are next to each other – where there are clear skies, it's likely that there are more clear skies nearby.

But then… where exactly are these patches? The column description says “The healpix (in galactic l, b) for which this data applies. This is of the order given in the hpx_order column”. Hm.

To go from HEALPix to positions, there is the ivo_healpix_center user defined function (UDF) on many ADQL services; it is part of the IVOA's UDF catalogue, so whenever you see it, it will do the same thing. And where would you see it? Well, in TOPCAT, UDFs show up in the Service tab with a signature and a short description. In this case:
```
ivo_healpix_center(hpxOrder INTEGER, hpxIndex BIGINT) -> POINT

  returns a POINT corresponding to the center of the healpix with the
  given index at the given order.
```
With this, we can change our query to spit out positions rather than indices:
```
SELECT TOP 30 ivo_healpix_center(hpx_order, healpix) AS pos, extinction
FROM (
  SELECT healpix, best_fit[20] as extinction, hpx_order
  FROM prdust.map_union) AS q
ORDER BY extinction ASC
```
The result is:
```
# pos                                    extinction
"(42.27822580645164, 78.65148926014334)" 0.00479
"(42.44939271255061, 78.6973986631694)"  0.0068
"(58.97460937500027, 40.86635677386179)" 0.00707
...
```
That's my positions all right, but they are still in galactic coordinates. That may be fine for many applications, but I'd like to have them in ICRS. Transforming them takes another UDF; this one is not yet standardised and hence has a gavo_ prefix (which means you will only find it on reasonably new services driven by DaCHS).

On services that have that UDF (and the GAVO DC TAP certainly is one of them), you can write:
```
SELECT TOP 30
  gavo_transform('GALACTIC', 'ICRS',
    ivo_healpix_center(hpx_order, healpix)) AS pos,
  extinction
FROM (
  SELECT healpix, best_fit[20] as extinction, hpx_order
  FROM prdust.map_union) AS q
ORDER BY extinction ASC
```
That results in:
```
# pos                                    extinction
"(205.6104289782676, 28.392541949473785)" 0.00479
"(205.55600830161907, 28.42330388161418)" 0.0068
"(250.47595812552925, 36.43011215633786)" 0.00707
"(166.10872483007287, 21.232866316024364)" 0.00714
"(259.3314211312357, 43.09275090468469)" 0.00742
"(114.66957763676628, 21.603135736808532)" 0.00787
"(229.69174233173712, 2.0244022486718793)" 0.00793
"(214.85349325052758, 33.6802370378023)" 0.00804
"(204.8352084989552, 36.95716352922782)" 0.00806
"(215.95667870050661, 36.559656879148044)" 0.00839
"(229.66068062277128, 2.142516479012763)" 0.0084
"(219.72263539838667, 58.371829835018424)" 0.00844
...
```
If you have followed along, you now have a table of the 30 least reddened patches in the sky according Bayestar17. And you are probably as curious to see them as I was. That curiosity made me start Aladin and select WISE colour imagery, reckoning dust (at the right temperature) would be more conspicuous in WISE's wavelengths then in, say, DSS.

I then did Views -> Activation Actions and wanted to check “Send Sky Coordinates“ to make Aladin show the sky at the position of my patches. This is usually preconfigured by TOPCAT to just work when tables have positions. Alas: at least in versions up to 4.8, TOPCAT does not know about points (in the ADQL sense) when making clever guesses there.

But there is a workaround: Select “Send Sky Coordinates” in the Activation Actions window and then type pos[0] in “RA Column“, and pos[1] in “Dec Column” – this works because under the hood, VOTable points are just 2-arrays. That done, you can check the activation action.

After these preparations, when you click through the first few results, you will find objects like those in the opending image (and also a few fairly empty fields). Stellar clusters are relatively rare on the sky, so their prevalence in these patches quite clearly shows that Bayestar's model has a bit of a fixation about them that's certainly not related to dust.

Which goes to serve as another example of Demleitner's law 567: “In any table, the instances with the most extreme values are broken with a likelihood of 0.567”.

Category: Demo
LAMOST5 meets Datalink

2019-12-11 Markus Demleitner

One of the busiest spectral survey instruments operated right now is the Large Sky Area Multi-Object Fiber Spectrograph Telescope (LAMOST). And its data in the VO, more or less: DR2 and DR3 have been brought into the VO by our Czech colleagues, but since they currently lack resources to update their services to the latest releases, they have kindly given me their DaCHS resource descriptor, and so I had a head start for publishing DR5 in Heidelberg.

With some minor updates, here it is now: Over nine million medium-resolution spectra covering large parts of the northen sky – the spatial coverage is like this:

There's lots of fun to be had with this; of course, there's an SSA service, so when you point Aladin or Splat at some part of the covered sky and look for spectra, chances are you'll see LAMOST spectra, and when working on some of our tutorials (this one, for example), it happened that LAMOST actually had what I was looking for when writing them.

But I'd like to use the opportunity to mention two other modes of accessing the data.
Tablesample and TOPCAT's Plot Table activation action

Say you'd like to look at spectra of M stars and would like to have some sample from across the sky, fire up TOPCAT, point its TAP client the GAVO DC TAP service (http://dc.g-vo.org/tap) and run something like:
```
select
  ssa_pubDID, accref, raj2000, dej2000, ssa_targsubclass
from lamost5.data tablesample(1)
where
  ssa_targsubclass like 'M%'
```
This is using the TABLESAMPLE modifier in the from clause, which isn't standard ADQL yet. As mentioned in the DaCHS 1.4 announcement, DaCHS has a prototype implementation of what's been discussed on the IVOA's DAL mailing list: pick a part of a table rather than the full one. It takes a percentage as an argument, and tells the server to choose about this percentage of the table's records using a reasonable and fast heuristic. Note that this won't give you perfect statistical sampling, but if it's not “good enough” for some purpose, I'd like to learn about that purpose.

Drawing a proper statistical sample, on the other hand, would take minutes on the GAVO database server – with tablesample, I had the roughly 6000 spectra the above query returns essentially instantaneously, and from eyeballing a sky plot of them, I'd say their distribution is close enough to that of the full DR5. So: tablesample is your friend.

For a quick look at the spectra themselves, in TOPCAT click Views/Activation Actions, check “Plot Table” and make sure TOPCAT proposes the accref column as “Table Location” (if you don't see these items, update your TOPCAT – it's worth it). Now click on a row or perhaps a dot on a plot and behold an M spectrum.
Cutouts via Datalink

LAMOST releases spectra in FITS format pretty much like the ones you may know from SDSS. The trick above works because we instead hand out proper, IVOA Spectral Data Model-compliant spectra through SSA and TAP. However, if you need to go back to the original files, you can, using Datalink. If you're unsure what this Datalink thing is: call me vain, but I still like my 2015 ADASS poster explaining that. In TOPCAT, you'd be using the “Invoke Service” activation action to get to the datalinks.

If you have actual work to do, offloading repetetive work to the computer is what you want, and fortunately, pyVO knows about datalink, too. I give you this is hard to discover so far, and the interface is... a tiny bit clunky. Until some kind soul cleans up the pyVO datalink act, a poster Stefan and I showed at the 2017 ADASS might give you an idea which buttons to press. Or read on and see how things work for LAMOST5.

The shortest way to datalinks is a TAP query that at least retrieves the ssa_pubdid column (that's a must; Datalink can't work without it) and, on the result, run the iter_datalinks method. This returns an object in which you can find the associated data items (in this case, a preview and the original FITS with the #progenitor semantics), plus the cutout service.

Hence, a minimal example for pulling the legacy FITS links out of the first three items in lamost5.data would look like this:
```
import pyvo

svc = pyvo.dal.TAPService("http://dc.g-vo.org/tap")
for dl in svc.run_sync("select top 3 ssa_pubdid"
        " from lamost5.data").iter_datalinks():
    print(next(dl.bysemantics("#progenitor")
        )["access_url"].decode("ascii"))
```
This is a bit different from listing 2 in the poster linked above because it's python3, so getting the first element from iterator an iterator looks a bit different, and (curse astropy.votable for returning VOTable chars as bytes rather than strings!) you'll want to turn the URL into a proper string manually.

Another, actually more interesting, thing you can do with Datalink is cut out regions of interest. The LAMOST spectra are fairly long (though of course still small by image standards), so if you're only interested in a single line, you can save a bit of storage and bandwidth over blindly pulling the whole thing.

For instance, if you wanted to pull the vicinity of the H and K Fraunhofer lines from the matches in the loop in the snippet above, you could say:
```
from astropy import units as u
proc = next(dl.iter_procs())
cutout = proc.processed(band=(392*u.nm,398*u.nm))
```
And this is what I've done for the decorative left border above: it's the H and K line profiles for 0.1% of the stars LAMOST has classified as G8. Building the image didn't take more than a few seconds (where I'd like the cutouts to be faster by a factor of 10; I guess that's about an afternoon of work for me, so if it'd save you more than that afternoon, poke me to do it).

What's coming back is tables. By the time python has digested these, they're numpy record arrays. Thus, you can immediately bring in your beloved scipy (or whatever). For instance, if for some reason you're convinced that the H and K lines should be fit by identical Gaussians in the boring case and would like find objects for which that's patently untrue and that hence could be un-boring, here's how you could do that:
```
def spectral_model(wl, c1, c2, depth, width):
    return (1
        -depth*numpy.exp(-numpy.square(wl-c1)
            /numpy.square(width))
        -depth*numpy.exp(-numpy.square(wl-c2)
            /numpy.square(width)))

for pubdid, prof in get_profiles(
        "G8", (392*u.nm,398*u.nm), 0.01, 4):
    prof["flux"] /= max(prof["flux"])
    popt, pcov = curve_fit(
        spectral_model, prof["spectral"], prof["flux"],
        sigma=prof["flux_error"],
        p0=[3968, 3934, 1, 1])
    if pcov[3][3]>1:
        break
```
– where get_profiles is essentially doing the TAP plus datalink routine above, except I'm swallowing spectra with too much noise and I have the function transform the spectral coordinate into the objects' rest frames. If you're curious how I'm doing this just based on the IVOA Spectral Data Model, check the source linked at the bottom of this post.

I've just run this, and the first spectrum that the machinery flagged as suspicious was this:

– which doesn't look like I've made a discovery just yet. But that doesn't mean there's not a lot to find within LAMOST5's lines...

To get you up to speed quickly: here's the actual python3 code I ran for the “analysis” and the plot.
Category: Data

« Page 2 / 4 »

Posts with the Tag TAP:

Why Bother?

DASCH: The Harvard Plates

Plates in Global Discovery

TAP, Uploads, and pyVO on DASCH

Disclaimers

Comparison with LAMOST

Mass Retrieval of Spectra

Tablesample and TOPCAT's Plot Table activation action

Cutouts via Datalink