Queries Against My Obscore Are Slow!

Content Warning: This is fairly deep nerd stuff. If you are just a normal VO user, you probably don't want to know about this. You probably even don't want to know about it if you are running a smallish DaCHS site. But perhaps you'll enjoy it anyway.

Last May, I finally tried to get to the bottom of why certain queries against my obscore table – and in particular some joins I care about – were unneccessarily slow. The immediate use case was that I wanted to join the proposed radio extension for obscore to the main obscore table like this:

SELECT COUNT(*)
    FROM ivoa.obscore
    JOIN ivoa.obs_radio
    USING (obs_publisher_did)

This looks harmless, in particular since there are almost always indexes on obs_publisher_did columns for operational purposes: DaCHS uses them to locate rows in, for instance, Datalink operation.

It is not. Harmless, I mean. On the contrary.

Match Your Types Before You UNION ALL

The main reason why there is a trap is that ivoa.obscore in DaCHS is a view (i.e., some sort of virtual table defined by a SQL query). This is because typically, multiple data collections contribute, and they can change independently of each other. We do not want to have to rebuild a full obscore table (which has almost 150 million rows in the Heidelberg data centre right now) just because we fix the metadata of a handful of images somewhere.

Hence, ivoa.obscore is built somewhat like this in DaCHS[1]:

CREATE OR REPLACE VIEW ivoa.obscore AS
 SELECT 'image'::text AS dataproduct_type,
    NULL::text AS dataproduct_subtype,
    2::smallint AS calib_level,
    'BGDS'::text AS obs_collection,
    ...
 FROM bgds.data
UNION ALL
  SELECT 'image'::text AS dataproduct_type,
    NULL::text AS dataproduct_subtype,
    3::smallint AS calib_level,
  ...
[and 42 further subqueries that are union-ed together]

It turns out that this architecture is dangerous in Postgres.

Laurenz Albe has a writeup on the underlying problem, which he summarises in a cartoon as “Before I UNION ALL you, be sure that your types match”. In short, UNION ALL becomes a planner barrier when the types of the columns of the relations being merged do not exactly match. For this purpose, a bigint is completely different from an integer.

Full disclosure: it's not like I figured out the applicability of Laurenz' analysis to the DaCHS troubles by myself. It actually took multiple applications of the cluestick by Tom Lane, Laurenz, and others on pgsql-general.

Known Problem Is Not Solved Problem

Hence, since May, I sort-of understood the problem. Fixing it, on the other hand, seemed rather overwhelming given the size of the view and sometimes multiple levels of view building. In consequence, I procrastinated actually doing something about it until some time last November when I realised that the computer could support the analysis of what types from which tables do not match.

I therefore wrote analyze-obscore.py and added it to the DaCHS repo. It will (presumably) never be part of the DaCHS package, but you can simply run it from a clone of the repo – and should do so if you have an obscore view fed from multiple tables.

The output then is something like:

==== access_estsize ====

  bgds.data                      accsize/1024
  danish.data                    accsize/1024
  dfbsspec.ssa                   accsize/1024
  plts.data                      accsize/1024
  emi.main                       access_estsize (bigint)
  rosat.images                   accsize/1024
  califadr3.cubes                10
  robott.data                    accsize/1024
  k2c9vst.timeseries             accsize/1024
  dasch.narrow_plates            access_estsize (bigint)
  onebigb.ssa                    accsize/1024
  [...]

==== access_format ====

  bgds.data                      mime (text)
  danish.data                    mime (text)
  dfbsspec.ssa                   mime (text)
  plts.data                      mime (text)
  emi.main                       access_format (text)
  rosat.images                   mime (text)
  califadr3.cubes                'application/x-votable+xml;content=datalink'
  [...]

==== calib_level ====

  bgds.data                      2
  danish.data                    2
  dfbsspec.ssa                   2
  plts.data                      1
  emi.main                       calib_level (smallint)
  rosat.images                   2
  califadr3.cubes                3
  [...]

and so on. That is: for each table contributing to a column, it either shows the source column together with its type, a literal, or the full expression. Literals are not problematic: as it turns out, DaCHS has always cast them to the appropriate type, so as long as the other source columns match what obscore thinks the columns ought to be, you should be fine.

Expressions are more difficult. The only way to be sure there is to ask Postgres, somewhat like this:

select pg_typeof(accsize/1024) from bgds.data limit 1

Changing Types En Masse

In my case, I had lots of inconsistencies between columns coming from SSA and more directly from obscore-like tables. If you have spectra and other things in one obscore table created by DaCHS <2.12.2, so will you.

This is because in my obscore implementation I followed the somewhat ill-advised types written down in (but in my reading not actually requried by) the obscore specification (p. 21). There is no conceivable scenario that would require more than 231 polarisation states (the pol_xel column, which is supposed to be “adql:BIGINT”), and I do not feel overly future-skeptic when I say that it will also be some time until we have images with a linear dimension of more than two billion. There is also no good reason to have an order-of-magnitude value like em_res_power to 16 significant digits (as implied by “adql:DOUBLE”)[2].

I have cleaned this up in DaCHS 2.12.2. With this, the types of Obscore and the corresponding columns in SSA and SIAP are consistent within DaCHS' metadata declarations.

However, the on-disk tables will keep their original types regardless of what DaCHS claims they are. You could fix this by re-importing the tables, but that would take quite a while, at least in my case. I have hence opted for targeted updates.

The first step in that procedure is to figure out where Postgres' ideas of columns are now different from DaCHS' ideas given the recent metadata updates. For that, dachs val has had the -c (or --compare-db) flag for a long time. Running:

dachs val -vc ALL

gives you a list of all RDs that need work because the on-disk types (which actually determine the query plan) differ from DaCHS' expectations (which will fix the UNION ALL trouble). Once they match, you can feel entitled to a good query plan.

Based on this, I have incrementally built a fixing script on my development system. As I'm pointing out towards the end of Publishing a Service in the DaCHS tutorial, the recommended way to run a DaCHS-based data centre is to have test snippets of almost all the resources on the production system on a <cough> development system (presumably: your laptop). That's what I do, and in this way I built this script:

import subprocess

from gavo import api

with api.getWritableAdminConn() as conn:
        conn.execute("DROP VIEW IF EXISTS ivoa.obscore")
        conn.execute("DROP VIEW IF EXISTS dasch.plates")

        for table_name in ["emi.main", "dasch.narrow_plates"]:
                conn.execute(f"ALTER TABLE {table_name} ALTER t_xel TYPE integer")

        for table_name in [
                        "emi.main", "dasch.narrow_plates", "ppakm31.cubes", "applause.main",]:
                conn.execute(f"ALTER TABLE {table_name} ALTER s_xel1 TYPE integer")
                conn.execute(f"ALTER TABLE {table_name} ALTER s_xel2 TYPE integer")

        for table_name in [
                        "emi.main",
                        "dasch.narrow_plates"]:
                conn.execute(f"ALTER TABLE {table_name} ALTER pol_xel TYPE integer")

        for table_name in [
                        "emi.main",
                        "dasch.narrow_plates",
                        "califadr3.cubes"]:
                conn.execute(f"ALTER TABLE {table_name} ALTER em_xel TYPE integer")
                conn.execute(f"ALTER TABLE {table_name} ALTER em_res_power TYPE real")

        for table_name in [
                        "emi.main",
                        "dasch.narrow_plates",
                        "ppakm31.cubes",
                        "applause.main",
                        "califadr3.cubes"]:
                conn.execute(f"ALTER TABLE {table_name} ALTER em_min TYPE real")
                conn.execute(f"ALTER TABLE {table_name} ALTER em_max TYPE real")

        for table_name in [
                        "emi.main",
                        "dasch.narrow_plates",
                        "applause.main"]:
                conn.execute(f"ALTER TABLE {table_name} ALTER s_resolution TYPE real")
                conn.execute(f"ALTER TABLE {table_name} ALTER s_pixel_scale TYPE real")
                conn.execute(f"ALTER TABLE {table_name} ALTER s_fov TYPE real")


for rd_id in ["emi/q", "califa/q3", "rome/q", "dasch/q", "ppakm31/q"]:
        subprocess.call(["dachs", "imp", "-m", rd_id])

subprocess.call(["dachs", "imp", "dasch/q", "make-view"])
subprocess.call(["dachs", "imp", "//obscore"])

As I said: which columns to fix I learned from dachs val -vc; the extra DaCHS operations were necessary because Postgres refused the type changes as long as the views were still defined.

Success?

This entire operation has made quite a few obscore queries a lot faster.

Regrettably, the motivating query, viz.,:

select count(*)
from ivoa.obscore
natural join ivoa.obs_radio

is still slow. I have dug a bit into why Postgres does not find the seemingly obvious plan of just materialising the join with the tiny obs_radio table and contented myself with the note that has been in section 9.21 of the postgres documentation forever:

Users accustomed to working with other SQL database management systems might be disappointed by the performance of the count aggregate when it is applied to the entire table. A query like:

SELECT count(*) FROM sometable;

will require effort proportional to the size of the table: PostgreSQL will need to scan either the entire table or the entirety of an index that includes all rows in the table.

But at least a query like:

select dataproduct_type, access_url, t_min, t_max
from ivoa.obscore
natural join ivoa.obs_radio
where t_min between 56000 and 56005

is fast, and until further trouble that's good enough for me.

[1]In case you wonder: the individual parts of this union are kept in a table ivoa._obscoresources that you can inspect and even manipulate for special effects. The management of that table is among there more complex things one can do in DaCHS RDs. If you are curious, dachs adm dump //obscore will show you all the magic.
[2]I put these type names into quotation marks because they were never formally defined. What Obscore does there has been identified as an antipattern in the meantime; newer specifications of similar schemas only distinguish floating point, integral, and string types and leave the choice of lengths to the implementations. If I may say so myself, I like the considerations on types within section 8 of RegTAP.