Posts with the Tag DaCHS:

  • DaCHS 2.12 Is Out

    The DaCHS logo, a badger's head and the text "VO Data Publishing"

    A bit more than one month after the last Interop, I have released the next version of GAVO's data publication package, DaCHS. This is the customary post on what is new in this release.

    There is no major headline for DaCHS 2.12, but there is a fair number of nice conveniences in it. For instance, if you have a collection of time series to publish, the new time series service template might help you. You get it by calling dachs start timeseries; I will give you that it suffers from about the same malady as the existing ssap+datalink one: There is a datalink service built in from the start, which puts up a scary amount of up-front complexity you have to master before you get any sort of gratification.

    There is little we can do about that; the creators of time series data sets just have not come up with a good convention for how to write them. I might be moved to admit that putting them into nice FITS binary tables might count as acceptable. In practice, none of the time series I got from my data providers came in a format remotely fit for distribution. Perhaps Ada's photometric time series convention (which is what you will deliver with the template) is not the final word on how to represent time series, but it is much better than anything else I have seen. Turning what you get from your upstreams into something you can confidently hand out to your users just requires Datalink at this point I'm afraid[1].

    I will add tutorial chapters for how to deal with the datalink-infested templates one of these days; within them bulk commenting will play a fairly important role. For quite a while, I have recommended to define a lazy macro with a CDATA section in order to comment out a large portion of an RD. I have changed that recommendation now to open such comments with <macDef raw="True" name="todo"><![CDATA[ and close them with ]]></macDef>. The new (2.12) part is the raw="True". This only means that DaCHS will not try to expand macros within the macro definition. So far, it has done that, and that was a pain in for the datalink-infested templates, because there are macro calls in the templates, but some of them will not work in the RD context the macDef is in, which then lead to hard-to-understand RD parse errors.

    By the way, in case you would like to write your template to a file other than q.rd (perhaps because there already is one in your resdir), there is now an -o option to dachs start.

    Speaking of convenience, defining spectral coverage has become a lot less of a pain in 2.12. So far, whenever you had to manually define a resource's STC coverage (and that is not uncommon for the spectral axis, where dachs limits often will find no suitable columns or does not find large gaps in observations in multiple narrow bands), you had to turn the Ångströms or GHz into Joule by throwing in the right amounts of c, h, and math operators. Now, you just add the appropriate units in square brackets and let DaCHS work out the rest; DaCHS will also ensure that the lower limit actually is smaller than the upper limit. A resource covering a number of bands in various parts of the spectrum might thus say:

    <coverage>
      <spectral>100[kHz] 21.5[cm]</spectral>
      <spectral>2[THz] 1[um]</spectral>
      <spectral>653[nm] 660[nm]</spectral>
      <spectral>912[Angstrom] 10[eV]</spectral>
      <spectral>20[GeV] 100[GeV]</spectral>
    </coverage>
    

    DaCHS will produce a perfectly viable coverage declaration for the Registry from that.

    Still in the convenience department, I have found myself define a STREAM (in case you don't know what I'm talking about: read up on them in the tutorial) that creates pairs of columns for a value and its error once to often. Thus, there is now the //procs#witherror stream. Essentially, you can replace the <column in a column definition with <FEED source="//procs#witherror, and you get two columns: One with the name itself, the other with a name of err_name, and the columns ought to have suitable metadata. For instance:

    <FEED source="//procs#witherror
      name="rv" type="double precision"
      unit="km/s" ucd="spect.dopplerVeloc"
      tablehead="RV_S"
      description="Radial velocity derived by the Serval pipeline"
      verbLevel="1"/>
    

    You cannot yet have values children with witherror, but it is fairly uncommon for such columns to want them: you won't enumerate values or set null values (things with errors will be floating point values, which have “natural” null values at least in VOTable), and columns statistics these days are obtained automatically by dachs limits.

    You can take this a turn further and put witherror into a LOOP. For instance, to define ugriz photometry with errors, you would write:

    <LOOP>
      <csvItems>
      item, ucd
      u, U
      g, V
      r, R
      i, I
      z, I
      </csvItems>
      <events passivate="True">
        <FEED source="//procs#witherror name="mag_\item"
          unit="mag" ucd="phot.mag;em.opt.\ucd">
          tablehead="m_\item"
          description="Magnitude in \item band"/>
      </events>
    </LOOP>
    

    There is a difficult part in this: the passivate="True" in the events element. If you like puzzlers, you may want to figure out why that is needed based on what I document about active tags in the reference documentation. Metaprogramming and Macros become subtle not only in DaCHS.

    Far too few DaCHS operators define examples for their TAP services. Trust me, your users will love them. To ensure that they still are good, you can now pass an -x flag to dachs val (nb not dachs test); that will execute all of the TAP examples defined in the RD against the local server and complain when one does not return at least one valid row. The normal usage would be to say dachs val -x //tap if you define your examples in the userconfig RD; but with hierarchical examples, any RD might contain examples modern TAP clients will pick up.

    There is another option to have an example tested: you could put the query into a macro (remember macDef above?) and then use that macro both in the example and in a regTest element. That is because url attributes now expand macros. That may be useful for other and more mundane things, too; for instance, you could have DaCHS fill in the schema in queries.

    Actual new features in 2.12 are probably not very relevant to average DaCHS operators, at least for now:

    • users can add indexes to their persistent uploads (featured here before)
    • registration of VOEvent streams according to the current VOEvent 2.1 PR (ask if interested; there is minimal documentation on this at this point).
    • an \if macro that sometimes may be useful to skip things that make no sense with empty strings: \if{\relpath}{http://example.edu/p/\relpath} will not produce URLs if relpath is empty.
    • if you have tables with timestamps, it may be worth running dachs limits on them again, as DaCHS will now obtain statistics for them (in MJD, if you have to know) and consequently provide, e.g., placeholders.
    • our spatial WCS implementation no longer assumes the units are degrees (but still that it is dealing with spherical coordinates).
    • when params are array-valued, any limits defined in values are now validated component-wise.

    Finally, if you inspected a diff to the last release, you would see a large number of changes due to type annotation of gavo.base. I have promised to my funders to type-annotate the entire DaCHS code (except perhaps for exotic stuff I shouldn't have written in the first place, viz., gavo.stc) in order to make it easier for the community to maintain DaCHS.

    From my current experience, I don't think I will keep this particular promise. After annotating several thousand lines of code my impression is that the annotation is a lot of effort even with automatic annotation helpers (the cases it can do are the ones that would be reasonably quick for a human, too). The code does in general improve in consequence (but not always), but not fundamentally, and it does not become dramatically more readable in most places (there are exceptions to that reservation, though).

    All in all, the cost/benefit ratio just does not seem to be small enough. And: the community members that I want to encourage to contribute code would feel obliged to write type annotations, too, which feels like an extra hurdle I would like to spare them.

    [1]Ok: you could also do an offline conversion of the data collection before ingestion, but I tend to avoid this, partly because I am reluctant to touch upstream data, but in this case in particular because with the current approach it will be much easier to adopt improved serialisations as they become defined.
  • DaCHS 2.11: Persistent TAP Uploads

    The DaCHS logo, a badger's head and the text "VO Data Publishing"

    The traditional autumn release of GAVO's server package DaCHS is somewhat late this year, but not so late that could not still claim it comes after the interop. So, here it is: DaCHS 2.11 and the traditional what's new post.

    But first, while I may have DaCHS operators' attention: If you have always wondered why things in DaCHS are as they are, you will probably enjoy the article Declarative Data Publication with DaCHS, which one day will be in the proceedings of ADASS XXXIV (and before that probably on arXiv). You can read it in a pre-preprint version already now at https://docs.g-vo.org/I301.pdf, and feedback is most welcome.

    Persistent TAP Uploads

    The potentially most important new feature of DaCHS 2.11 (in my opinion) will not be news to regular readers of this blog: Persistent TAP Uploads.

    At this point, no client supports this, and presumably when clients do support it, it will look somewhat different, but if you like the bleeding edge and have users that don't mind an occasional curl or requests call, you would be more than welcome to help try the persistent uploads. As an operator, it should be sufficient to type:

    dachs imp //tap_user
    

    To make this more useful, you probably want to hand out proper credentials (make them with dachs adm adduser) to people who want to play with this, and point the interested users to the demo jupyter notebook.

    I am of course grateful for any feedback, in particular on how people find ways to use these features to give operators a headache. For instance, I really would like to avoid writing a quota system. But I strongly suspect will have to…

    On-loaded Execute-s

    DaCHS has a built-in cron-type mechanism, the execute Element. So far, you could tell it to run jobs every x seconds or at certain times of the day. That is fine for what this was made for: updates of “living” data. For instance, the RegTAP RD (which is what's behind the Registry service you are probably using if you are reading this) has something like this:

    <execute title="harvest RofR" every="40000">
      <job><code>
          execDef.spawnPython("bin/harvestRofR.py")
      </code></job>
    </execute>
    

    This will pull in new publishing registries from the Registry of Registries, though that is tangential; the main thing is that some code will run every 40 kiloseconds (or about 12 hours).

    Against using plain cron, the advantage is that DaCHS knows context (for instance, the RD's resdir is not necessary in the example call), that you can sync with DaCHS' own facilities, and most of all that everything is in once place and can be moved together. By the way, it is surprisingly simple to run a RegTAP service of your own if you already run DaCHS. Feel free to inquire if you are interested.

    In DaCHS 2.11, I extended this facility to include “events” in the life of an RD. The use case seems rather remote from living data: Sometimes you have code you want to share between, say, a datalink service and some ingestion code. This is too resource-bound for keeping it in the local namespace, and that would again violate RD locality on top.

    So, the functions somehow need to sit on the RD, and something needs to stick them there. To do that, I recommended a rather hacky technique with a LOOP with codeItems in the respective howDoI section. But that was clearly rather odious – and fragile on top because the RD you manipulated was just being parsed (but scroll down in the howDoI and you will still see it).

    Now, you can instead tell DaCHS to run your code when the RD has finished loading and everything should be in place. In a recent example I used this to have common functions to fetch photometric points. In an abridged version:

    <execute on="loaded" title="define functions"><job>
      <setup imports="h5py, numpy"/>
      <code>
      def get_photpoints(field, quadrant, quadrant_id):
        """returns the photometry points for the specified time series
        from the HDF5 as a numpy array.
    
        [...]
        """
        dest_path = "data/ROME-FIELD-{:02d}_quad{:d}_photometry.hdf5".format(
          field, quadrant)
        srchdf = h5py.File(rd.getAbsPath(dest_path))
        _, arr = next(iter(srchdf.items()))
    
        photpoints = arr[quadrant_id-1]
        photpoints = numpy.array(photpoints)
        photpoints[photpoints==0] = numpy.nan
        photpoints[photpoints==-9999.99] = numpy.nan
    
        return photpoints
    
    
      def get_photpoints_for_rome_id(rome_id):
        """as get_photpoints, but taking an integer rome_id.
        """
        field = rome_id//10000000
        quadrant = (rome_id//1000000)%10
        quadrant_id = (rome_id%1000000)
        base.ui.notifyInfo(f"{field} {quadrant} {quadrant_id}")
        return get_photpoints(field, quadrant, quadrant_id)
    
      rd.get_photpoints = get_photpoints
      rd.get_photpoints_for_rome_id = get_photpoints_for_rome_id
    </code></job></execute>
    

    (full version; if this is asking you to log in, tell your browser not to wantonly switch to https). What is done here in detail again is not terribly relevant: it's the usual messing around with identifiers and paths and more or less broken null values that is a data publisher's everyday lot. The important thing is that with the last two statements, you will see these functions whereever you see the RD, which in RD-near Python code is just about everywhere.

    dachs start taptable

    Since 2018, DaCHS has supported kickstarting the authoring of RDs, which is, I claim, the fun part of a data publisher's tasks, through a set of templates mildly customised by the dachs start command. Nobody should start a data publication with an empty editor window any more. Just pass the sort of data you would like to publish and start answering sensible questions. Well, “sort of data” within reason:

    $ dachs start list
    epntap -- Solar system data via EPN-TAP 2.0
    siap -- Image collections via SIAP2 and TAP
    scs -- Catalogs via SCS and TAP
    ssap+datalink -- Spectra via SSAP and TAP, going through datalink
    taptable -- Any sort of data via a plain TAP table
    

    There is a new entry in this list in 2.11: taptable. In both my own work and watching other DaCHS operators, I have noticed that my advice “if you want to TAP-publish any old material, just take the SCS template and remove everything that has scs in it” was not a good one. It is not as simple as that. I hope taptable fits better.

    A plan for 2.12 would be to make the ssap+datalink template less of a nightmare. So far, you basically have to fill out the whole thing before you can start experimenting, and that is not right. Being able to work incrementally is a big morale booster.

    VOTable 1.5

    VOTable 1.5 (at this point still a proposed recommendation) is a rather minor, cleanup-type update to the VO's main table format. Still, DaCHS has to say it is what it is if we want to be able to declare refposition in COOSYS (which we do). Operators should not notice much of this, but it is good to be aware of the change in case there are overeager VOTable parsers out there or in case you have played with DaCHS MIVOT generator; in 2.10, you could ask it to do its spiel by requesting the format application/x-votable+xml;version=1.5. In 2.11, it's application/x-votable+xml;version=1.6. If you have no idea what I was just saying, relax. If this becomes important, you will meet it somewhere else.

    Minor Changes

    That's almost it for the more noteworthy news; as usual, there are a plethora of minor improvements, bug fixes and the like. Let me briefly mention a few of these:

    • The ADQL form interface's registry record now includes the site name. In case you are in this list, please say dachs pub //adql after upgrading.
    • More visible legal info, temporal, and spatial coverage in table and service infos; one more reason to regularly run dachs limits!
    • VOUnit's % is now known to DaCHS (it should have been since about 2.9)
    • More vocabulary validation for VOResource generation; so, dachs pub might now complain to you when it previously did not. It is now right and was wrong before.
    • If you annotate a column as meta.bib.bibcode, it will be rendered as ADS links
    • The RD info links to resrecs (non-DaCHS resources, essentially), too.

    Upgrade As Convenient

    As usual, if you have the GAVO repository enabled, the upgrade will happen as part of your normal Debian apt upgrade. Still, if you have not done so recently, have a quick look at upgrading in the tutorial. If, on the other hand, you use the Debian-distributed DaCHS package and you do not need any of the new features, you can let things sit and enjoy the new features after your next dist-upgrade.

  • What's new in DaCHS 2.10

    A part of the IVOA product-type vocabulary, and the DaCHS logo with a 2.10 behind it.

    About twice a year, I release a new version of our VO server package DaCHS; in keeping with tradition, this post summarises some of the more notable changes of the most recent release, DaCHS 2.10.

    productTypeServed

    The next version of VODataService will probably have a new element for service descriptions: productTypeServed. This allows operators to declare what sort of files will come out of a service: images, time series, spectra, or some of the more exotic stuff found in the IVOA product-type vocabulary (you can of course give multiple of these). More on where this is supposed to go is found my Interop talk on this. DaCHS 2.10 now lets you declare what to put there using a productTypeServed meta item.

    For SIA and SSAP services, there is usually no need to give it, as RegTAP services will infer the right value from the service type. But if you serve, say, time series from SSAP, you can override the inference by saying something like:

    <meta name="productTypeServed">timeseries</meta>
    

    Where this really is important is in obscore, because you can serve any sort of product through a single obscore table. While you could manually declare what you serve by overriding obscore-extraevents in your userconfig RD, this may be brittle and will almost certainly get out of date. Instead, you can run dachs limits //obscore (and you should do that occasionally anyway if you have an obscore table). DaCHS will then feed the meta from what is in your table.

    A related change is that where a piece of metadata is supposed to be drawn from a vocabulary, dachs val will now complain if you use some other identifier. As of DaCHS 2.10 the only metadata item controlled in this way is productTypeServed, though.

    Registering Obscore Tables

    Speaking about Obscore: I have long been unhappy about the way we register Obscore tables. Until now, they rode piggyback in the registry record of the TAP services they were queriable through. That was marignally acceptable as long as we did not have much VOResource metadata specific to the Obscore table. In the meantime, we have coverage in space, time, and spectrum, and there are several meaningful relationships that may be different for the obscore table than for the TAP service. And since 2019, we have the Discovering Data Collections Note that gives a sensible way to write dedicated registry records for obscore tables.

    With the global dataset discovery (discussed here in February) that should come with pyVO 1.6 (and of course the productTypeServed thing just discussed), there even is a fairly pressing operational reason for having these dedicated obscore records. There is a draft of a longer treatment on the background on github (pre-built here) that I will probably upload into the IVOA document repository once the global discovery code has been merged. Incidentally, reviews of that draft before publication are most welcome.

    But what this really means: If you have an obscore table, please run dachs pub //obscore after upgrading (and don't forget to run dachs limits //obscore after you do notable changes to your obscore table).

    Ranking

    Arguably the biggest single usability problem of the VO is <drumroll> sorting! Indeed, it is safe to assume that when someone types “Gaia DR3“ into any sort of search mask, they would like to find some way to query Gaia's gaia_source table (and then perhaps all kinds of other things, but that should reasonably be sorted below even mirrors of gaia_source. Regrettably, something like that is really hard to work out across the Registry outside of these very special cases.

    Within a data centre, however, you can sensibly give an order to things. For DaCHS, that in particular concerns the order of tables in TAP clients and the order of the various entries on the root page. For instance, a recent TOPCAT will show the table browser on the GAVO data centre like this:

    Screenshot of a hierachical display, top-level entries are, in that order, ivoa, tap_schema, bgds, califadr3; ivoa is opened and shows obscore and obs_radio, califadr3 is opened and shows cubes first, then fluxpos tables and finally flux tables.

    The idea is that obscore and TAP metadata are way up, followed by some data collections with (presumably) high scientific value for which we are the primary site; within the califadr3 schema, the tables are again sorted by relevance, as most people will be interested in the cubes first, the somewhat funky fluxpos tables second, and in the entirely nerdy flux tables last.

    You can arrange this by assigning schema-rank metadata at the top level of an RD, and table-rank metadata to individual tables. In both cases, missing ranks default to 10'000, and the lower a rank, the higher up a schema or table will be shown. For instance, dfbsspec/q (if you wonder what that might be: see Byurakan to L2) has:

    <resource schema="dfbsspec">
      <meta name="schema-rank">100</meta>
        ...
        <table id="spectra" onDisk="True" adql="True">
          <meta name="table-rank">1</meta>
    

    This will put dfbsspec fairly high up on the root page, and the spectra table above all others in the RD (which have the implicit table rank of 10'000).

    Note that to make DaCHS notice your rank, you need to dachs pub the modified RDs so the ranks end up in DaCHS' dc.resources table; since the Registry does not much care for these ranks, this is a classic use case for the -k option that preserves the registry timestamp of the resource and will thus prevent a re-publication of the registry record (which wouldn't be a disaster either, but let's be good citizens). Ideally, you assign schema ranks to all the resources you care about in one go and then just say:

    dachs pub -k ALL
    

    The Obscore Radio Extension

    While the details are still being discussed, there will be a radio extension to Obscore, and DaCHS 2.10 contains a prototype implementation for the current state of the specification (or my reading of it). Technically, it comprises a few columns useful for, in particular, interferometry data. If you have such data, take a look at https://github.com/ivoa-std/ObsCoreExtensionForRadioData.git and then consider trying what DaCHS has to offer so far; now is the time to intervene if something in the standard is not quite the way it should be (from your perspective).

    The documentation for what to do in DaCHS is a bit scarce yet – in particular, there is no tutorial chapter on obs-radio, nor will there be until the extension has converged a bit more –, but if you know DaCHS' obscore support, you will be immediately at home with the //obs-radio#publish mixin, and you can see it in (very limited) action in the emi/q RD.

    The FITS Media Type

    I have for a long time recommended to use a media type of image/fits for FITS “images” and application/fits for FITS (binary) tables. This was in gross violation of standards: I had freely invented image/fits, and you are not supposed to invent media types without then registering them with the IANA.

    To be honest, the invention was not mine (only). There are applications out there flinging around image/fits types, too, but never mind: It's still bad practice, and DaCHS 2.10 tries to rectify it by first using application/fits even where defaults have been image/fits before, and actually retroactively changing image/fits to application/fits in the database where it can figure out that a column contains a media type.

    It is accepting image/fits as an alias for application/fits in SIAP's FORMAT parameter, and so I hope nothing will break. You may have to adapt a few regression tests, though.

    On the Way To pathlib.Path

    For quite a while, Python has had the pathlib module, which is actually quite nice; for instance, it lets you write dir / name rather than os.path.join(dir, name). I would like to slowly migrate towards Path-s in DaCHS, and thus when you ask DaCHS' configuration system for paths (something like base.getConfig("inputsDir")), you will now get such Path-s.

    Most operator code, however, is still isolated from that change; in particular, the sourceToken you see in grammars mostly remains a string, and I do not expect that to change for the forseeable future. This is mainly because the usual string operations many people to do remove extensions and the like (self.sourceToken[:-5]) will fail rather messily with Path-s:

    >>> n = pathlib.Path("/a/b/c.fits")
    >>> n[:-5]
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'PosixPath' object is not subscriptable
    

    So, if you don't call getConfig in any of your DaCHS-facing code, you are probably safe. If you do and get exceptions like this, you know where they come from. The solution, stringification, is rather straightforward:

    >>> str(n)[:-5]
    '/a/b/c'
    

    Partly as a consequence of this, there were slight changes in the way processors work. I hope I have not damaged anyone's code, but if you do custom previews and you overrode classify, you will have to fix your code, as that now takes an accref together with the path to be created.

    Odds And Ends

    As usual, there are many minor improvements and additions in DaCHS. Let me mention security.txt support. This complies to RFC 9116 and is supposed to give folks discovering a vulnerability a halfway reliable way to figure out who to complain to. If you try http://<your-hostname>/.well-known/security.txt, you will see exactly what is in https://dc.g-vo.org/.well-known/security.txt. If this is in conflict with some bone-headed security rules your institution may have, you can replace security.txt in DaCHS' central template directory (most likely /usr/lib/python3/dist-packages/gavo/resources/templates/); but in that case please complain, and we will make this less of a hassle to change or turn off.

    You can no longer use dachs serve start and dachs serve stop on systemd boxes (i.e., almost all modern Linux boxes as configured by default). That is because systemd really likes to manage daemons itself, and it gets cross when DaCHS tries to do it itself.

    Also, it used to be possible to fetch datasets using /getproduct?key=some/accref. This was a remainder of some ancient design mistake, and DaCHS has not produced such links for twelve years. I have now removed DaCHS' ability to fetch accrefs from key parameters (the accrefs have been in the path forever, as in /getproduct/some/accref). I consider it unlikely that someone is bitten by this change, but I personally had to fix two ancient regression tests.

    If you use embedded grammars and so far did not like the error messages because they always said “unknown location“, there is help: just set self.location to some string you want to see when something is wrong with your source. For illustration, when your source token is the name of a text file you process line by line, you would write:

    <iterator><code>
      with open(self.sourceToken) as f:
        for line_no, line in enumerate(f):
          self.location = f"{self.sourceToken}, {line_no}"
          # not do whatever you need to do on line
    </code></iterator>
    

    When regression-testing datalink endpoints, self.datalinkBySemantics may come in handy. This returns a mapping from concept identifiers to lists of matching rows (which often is just one). I have caught myself re-implementing what it does in the tests itself once too often.

    Finally, and also datalink-related, when using the //soda#fromStandardPubDID descriptor generator, you sometimes want to add just an extra attribute or two, and defining a new descriptor generator class for that seems too much work. Well, you can now define a function addExtras(descriptor) in the setup element and mangle the descriptor in whatever way you like.

    For instance, I recently wanted to enrich the descriptor with a few items from the underlying database table, and hence I wrote:

    <descriptorGenerator procDef="//soda#fromStandardPubDID">
      <bind name="accrefPrefix">"dasch/q/"</bind>
      <bind name="contentQualifier">"image"</bind>
      <setup>
        <code>
          def addExtras(descriptor):
            descriptor.suppressAutoLinks = True
            with base.getTableConn() as conn:
              descriptor.extMeta = next(conn.queryToDicts(
                "SELECT * FROM dasch.plates"
                " WHERE obs_publisher_did = %(did)s",
                {"did": descriptor.pubDID}))
        </code>
      </setup>
    </descriptorGenerator>
    

    Upgrade As Convenient

    That's it for the notable changes in DaCHS 2.10. As usual, if you have the GAVO repository enabled, the upgrade will happen as part of your normal Debian apt upgrade. Still, if you have not done so recently, have a quick look at upgrading in the tutorial. If, on the other hand, you use the Debian-distributed DaCHS package and you do not need any of the new features, you can let things sit and enjoy the new features after your next dist-upgrade.

    Oh, by the way: If you are still on buster (or some other distribution that still has astropy 4): A few (from my perspective minor) things will be broken; astropy is evolving too fast, but in general, I am trying to hack around the changes to make DaCHS work at least with the astropys in oldstable, stable, and unstable. However, in cases when a failure seems to be more of an annoyance to, I am resigning. If any of the broken things do bother you, do let me know, but also consider installing a backport of astropy 5 or higher – or, better, to dist-upgrade to bookworm. Sorry about that.

  • DaCHS 2.9 is out

    Our VO server package DaCHS almost always sees two releases per year, each time roughly after the Interops[1]. So, with the Tucson Interop over, it's time for DaCHS 2.9, and this is the traditional what's new post.

    Data Origin – the big headline for this release could be “curation”, in that three upcoming standardoid entities in that field are prototyped in 2.9. One is Data Origin, which is a note on how to embed some very basic provenance information into VOTables.

    This is going to help your users figure out how they came up with a VOTable when the referee has clever questions about the paper they submitted half a year earlier. The good news is: if you defined your metadata in your RD with sufficient care, with DaCHS 2.9 you will automatically do Data Origin.

    Feed your D links – another curation-related new thing in DaCHS is an implementation of what will hopefully be known as BibVO in the future. At this point, it is an unpublished note on Github. In essence, the purpose is to feed bibliographic services – and in particular the ADS – “D links”, i.e., links from publications to data. A part of this works automatically (the source metadatum), but the more advanced biblinks need a bit of manual intervention.

    If you even have, say, an observatory bibliography consisting pairs of papers and data used by these papers, you will probably have to write a handful of code. See biblinks in the reference documentation for details if any of this sounds as if it could apply to you. In this context, I have also enabled passing multiple accrefs to the /get endpoint. Users will then receive a tar file of the referenced data products.

    altIdentifiers in relationships – still in the bibliographic realm, VOResource 1.2 will (almost certainly) let you set altIdentifiers, in particular DOIs, when you declare relationships to other resources. That is probably of interest in particular when you want to declare relationships to things outside of the VO to services like b2find that themselves do not understand ivoids. In that situation, you would write something like:

    Cites: Some external thing
    Cites.altIdentifier: doi:10.fake/123412349876
    

    in a <meta> tag in your RD.

    json columns – postgresql has the very tempting and apparently all-powerful json type; it lets you stick complex structures into database columns and thus apparently relieve you of all the tedious tasks of designing database tables and documenting metadata.

    Written like this, you probably notice it's a slippery slope at best. Still, there are some non-hazardous uses for such columns, and thus you can now say type="json" or (probably preferably) type="jsonb" in column definitions. You can feed these columns with dicts, lists or JSON literals in strings. Clients will receive both of them as JSON string literals in char[*] FIELDs with an xtype of json. Neither astropy nor TOPCAT do anything with that xtype yet, but I expect that will change soon.

    Copy coverage – sometimes two resources have the same spatial (and potentially temporal and spectral) coverage. Since obtaining the coverage is an expensive operation, it would be nice to be able to say “aw, look at that other resource and take its coverage.” The classic example in DaCHS is the system-wide SIAP2 service that really is just a parametric wrapper around obscore. In such cases, you can now say something like:

    <coverage fallbackTo="__system__/obscore"/>
    

    – and //siap2 already does. That's one more reason to occasionally run dachs limits //obscore if you offer an obscore table.

    First VOTable row in tests – if you have calls to getFirstVOTableRow in regression tests (you have regression tests, right?) that return multiple rows, these will fail now until you also pass rejectExtras=False to that call. I've had regressions that were hidden by the function's liberal acceptance of extra rows, and it's too simple to produce unstable tests (that magically succeed and fail depending to the current state of the database) with the old behaviour. I hence hope for your sympathy and understanding in case I broke one of your tests.

    ADQL extensions – there is now arr_count to complement the array extension added in 2.7. Also, our custom UDFs transform, normal_random, to_jd, to_mjd, and simbadpoint now have a prefix of ivo_ rather than the previous gavo_. In order not to break existing queries, DaCHS will still accept the gavo_-prefixed names for the forseeable future, but it will no longer advertise them.

    Minor fixes – as usual, there are many minor bug fixes and improvements, the most visible of which is probably that DaCHS now correctly handles literal + chars in multipart-encoded (”uploads”) requests again; that was broken in 2.8 after the removal of the dependency on python's CGI module. Also, MOC-valued columns can now be serialised into non-VOTable formats like JSON or CSV.

    If you have been using DaCHS' built-in HTTPS support, certain clients may have rejected its certificates. That was because we were pulling an expired intermediate certificate from letsencrypt. If you don't understand what I was just saying, don't worry. If you do understand that and know a good way to avoid this kind of calamity in the future, I'm grateful for advice.

    VCS move – when DaCHS was born, using the venerable subversion for version control was considered reputable. These days, fewer and fewer people can still deal with that, and thus I have moved the DaCHS source code into a git repository: https://gitlab-p4n.aip.de/gavo/dachs/.

    I hear you moan “why not github?” Well: don't get me started unless you are prepared to listen to a large helping of proselytising. Suffice it to say that we in academia invented the internet (for all intents and purposes) and it's a shame that we now rely so much on commercial entities to provide our basic services (and then without paying them, as a rule, which is always a dangerous proposition towards commercial entities).

    Anyway: Feel free to use that service's bug tracker; we try to find ways to let you log in there without undue hardship, too.

    At this point, I customarily urge: don't wait, upgrade. If you have our Debian repository enabled, apt update && apt upgrade should do the trick, except if you missed our announcement on dachs-users that our repository key has changed. If you have not updated it, please have a look at our repo page to see what needs to be done. Sorry about this, but our old 1024D key was being frowned upon, so we had to do something.

    Unless you are an old hand and have upgraded many times before, let me recommend a quick glance at our upgrading guide before doing the actual upgrade.

    [1]The reason we wait for the Interops is that we are generally promising to put something into DaCHS at or around these conferences. This time, the preliminary support for json-typed database columns is an example for that.
  • GAVO at the Fall 2023 Interop in Tucson

    The Virtual Observatory, in practical terms, is the set of standards created and maintained by the IVOA. The IVOA, in turn, is a community almost defined by the two conferences it holds every year, the Interops (previously on this blog). The most recent Interop has just ended: The 2023 Tucson Fall Interop. Here are a few notes on what went on there from my (and to some extent GAVO's) perspective.

    A almost-orange orange haging in a tree.

    This fall's IVOA Interop was hosted by Steward Observatory, where they had ripening oranges in the backyard. They were edible!

    For at least a decade and a half, the autumn Interops have been back-to-back with the ADASS conferences. ADASS, short for Astronomical Data Analysis Software and Systems, is a venerable conference series, created far in the last century (this year: ADASS XXXIII) to have a forum for people who work in the magic triangle of astronomy, instrumentation, and data processing. Clearly, such a forum is very well suited to spread the word about the miracles we are working in the VO.

    To that end, I was involved in the creation of three posters: One on the use of MOCs in TAP – a somewhat extended version of something you saw on this blog first –, then one on data discovery in pyVO by Renaud Savalle (Paris) et al – a topic again familiar to readers of this blog – and finally one on improving the description of ADQL to enable more reliable machine validation of its grammar by Grégory Mantelet (Strasbourg) et al.

    As the conference at large goes, I was really delighted to see how basically everyone talking about data publication at all was stressing they are “doing VO”, which was a very welcome change from, perhaps, 10 years ago when this kind of talk was typcially extolling the virtues of one particular web or javascript framework. One of the great thing about standards in general and the VO in particular is that they tend to be a lot more durable than all those frameworks.


    The following Interop was a “short” one, lasting from Friday morning until Sunday noon, which meant that I was far too busy to do anything like a live blog while it went on. Let me hence just briefly point out the main talks related to GAVO's current activities and DaCHS.

    In Data Curation and Preservation on Saturday morning, Baptiste Cecconi (Paris) gave a nice overview of – among other things – what our bridge between the Registry and b2find (in particular, using the VOResource to DataCite mapper) enables in the context of the EOSC, and he briefly touched the question of how to properly make landing pages for VO resources (for which I am currently using another piece of XSLT).

    In the Radio session later that morning, Ixaka Labadie (Granada) gave a talk on how he is using DaCHS to deliver 3D visualisations for fairly impressive (prototype) SKA data. I particularly liked his illustrations of how DaCHS does Datalink and SODA. See his slide 12:

    Boxes and arrows illustrating how SIAP and Datalink are described in DaCHS resource descriptors

    In the afternoon, there was the Registry session, which featured me talking about the harvest trigger service I have been running for a while to help people across the anticlimactic moment when you have published your new resource but it won't show up in TOPCAT or pyVO for a day or so.

    The bulk of this session, however, was used for a discussion about various shortcomings of the Registry or its interfaces that I found pleasantly productive – incidentally, just like the discussion on word lists in EPN-TAP on Friday afternoon's Solar System Session that I had the pleasure to chair.

    In the DAL session on that afternoon, I had two talks: One was on the proposed new interoperable user-defined functions already implemented in DaCHS' ADQL and now coming up in several other services, too. Note to self: Some of these would probably be rather suitable blog post material.

    The second talk was a sort of brief show-and-tell pitch, in which I pointed out that hierarchical TAP examples using the elegant examples:continued property now actually work in both pyVO and TOPCAT:

    A three-level popup menu Service Provided -> Local UDFs -> using ivo_histogram

    Finally, in Sunday morning's Apps session, I talked about global image discovery in pyVO. This was about an early promise of the VO: just say where in space, time, and spectrum you need an image (or spectrum, or time series, or whatever), and some apparatus will find and query all the services that could have pertinent data. It would then present the metadata of the datasets it found in some useful form that would let you make informed decisions which to fetch.

    This was not too difficult in the olden days, but by now the VO is so big and complicated that a pyVO module with fairly involved logic is required. If you don't want to read the notes here, don't worry: I can safely predict that you'll read more about that topic on this blog.

    This is nowhere near done yet; so, it is one more piece of homework that I am taking home with me.

Page 1 / 6 »