Posts with the Tag Nerdstuff:

  • At the Gaia Passivation Event

    [All times in CET]

    9:00

    The instrument that featured most frequently (try this) in this blog is ESA's Gaia Spacecraft that, during the past eleven years, has obtained the positions and much more (my personal favourite: the XP spectra) of about two billion objects, mostly of stars, but also of quasars, asteroids and whatever else is reasonably point-like.

    Today, this mission comes to an end. To celebrate it – the mission, not the end, I would say –, ESA has organised a little ceremony at its operations centre in Darmstadt, just next to Heidelberg. To my serious delight, I was invited to that farewell party, and I am now listening to an overview of the passivation given by David Milligan, who used to manage spacecraft operations. This is a suprisingly involved operation, mostly because spacecraft are built to recover from all kinds of mishaps automatically and thus will normally come back on when you switch them off:

    Photo of a screen showing a linear flow chart with about 20 boxes, the contents of which is almost unreadable.  Heads are visible in front of the screen.

    But for now Gaia is still alive and kicking; the control screen shows four thrusters accelerating Gaia out of L2, the Lagrange point behind Earth, where it has been taking data for all these years (if you have doubts, you could check amateur images of Gaia taken while Gaia was particularly bright in the past few months; the service to collect them runs on my data centre).

    They are working the thrusters quite a bit harder than they were designed for to get to a Δv of 120 m/s (your average race car doesn't make that, but of course it takes a lot less time to accelerate, too). It is not clear yet if they will burn to the end; but even if one fails early, David explains, it is already quite unlikely that Gaia will return.

    Photo of a screen showing various graphs not particularly comprehendable no non-spacecraft engineers.  There is a visualisation of thrusters in the lower part of the screen, though, and that has four thrusters firing.  It also gives a tank pressure of 9.63 bar.

    9:35

    Just now the thrusters on the spacecraft have been shut down (”nominally”, as they say here, so they've reached the 120 m/s). Gaia is now on its way into a heliocentric orbit that, as the operations manager said, will bring it back to the Earth-Moon-System with chance of less than 0.25% between now and 2125. That's what much of this is about: You don't want Gaia to crash into anything else that's populating L2 (or something else near Earth, for that matter), or start randomly sending signals that might confuse other missions.

    9:42

    Gaia is now going blind. They have switched off the science computers a few minutes ago, which we could follow on the telemetry screen, and now they are switching off the CCDs, one after the other. The RP/BP CCDs, the ones that obtained my beloved XP spectra, are already off. Now the astrometry CCDs go grey (on the screen) one after the other. This feels oddly sombre.

    In a nerdy joke, they switched off the CCDs so the still active ones formed the word ”bye” for a short moment:

    Photo of a screen showing a matrix of numbers with column headings like SM1, AF8, BP, or RVS1.  There are green numbers (for CCDs still live) and grey ones (for CCDs already shut down).  The letters ”bye” are forming top to bottom.

    9:45

    The geek will inherit the earth. Some nerd has programmed Gaia to send, while it is slowly winding down, an extra text message: “The cosmos is vast. So is our curiosity. Explore!”. Oh wow. Kitsch, sure, but still goosebumpy.

    9:50

    Another nerdy message: ”Signing off. 2.5B stars. countless mysteries unlocked.” Sigh.

    9:53

    Gaia is now mute. The operations manager gave a little countdown and then said „mark”. We got to see the spectrum of the signal on the ground station, and then watch it disappear. There was dead silence in the room.

    9:55

    Gaia was still listening until just now. Then they sent the shutdown command to the onboard computer, so it's deaf, too, or actually braindead. Now there is no way to revive the spacecraft short of flying there. ”This is a very emotional moment,” says someone, and while it sounds like an empty phrase, it is not right now. ”Gaia has changed astronomy forever”. No mistake. And: ”Don't be sad that it's over, be glad that it happened”.

    12:00

    Before they shut down Gaia, they stored messages from people involved with the mission in the onboard memory – and names of people somehow working on Gaia, too. And oh wow, I found my name in there, too:

    Photo of a screen with many names.  There's a freehand highlight of the name of the author of this post.

    It felt a bit odd to have my name stored aboard a spacecraft in an almost eternal heliocentric orbit.

    But on reflection: This is solid state storage, in other words, some sort of EPROM. And that means that given the radiation up there, the charges that make up the bit pattern will not live long; colleagues estimated that, with a lot of luck, this might still be readable 20 years from now, but certainly not much longer. So, no, it's not like I now share Kurt Waldheim's privilege of having something of me stored the better part of eternity.

    12:30

    Andreas Rudolph, head of operations, now gives a talk on, well, ”Gaia Operations”.

    I have already heard a few stories of the olden days while chatting to people around here. For instance, ESTEC staff is here and gave some insight views on the stray light trouble that caused quite a few sleepless nights when it was discovered during commissioning. Eventually it turned out it was because of fibres sticking out from the sunshield. Today I learned that had a long history because unfolding the sunshield actually was a hard problem during spacecraft design, and, as Andreas just reminded us, a nail-biting moment during commissioning. The things need to be rollable but stiff, and unroll reliably once in space.

    People thought of almost everything. But once they showed the sunshield to an optical engineer while debugging the problem, after a few minutes he shone the flashlight of his telephone behind the screens and conclusively demonstrated the source of the stray light.

    Space missions are incredibly hard. Even the smallest oversights can have tremendous consequences (although the mission extension after the original five years of mission time certainly helped offsetting the stray light problem).

    Andreas discussed more challenges like that, in particular the still somewhat mysterious Basic Angle Variation, and finished predicting that in 2029, Gaia will next approach Earth, passing at a distance of about 10 million kilometers. I don't think it will be accessible to amateur telescopes, perhaps not even to professional ones. But let's see.

    13:00

    Gaia data processing is (and will be for another 10 years or so) performed by a large international collaboration called DPAC. DPAC is headed by Anthony Brown, and his is the last talk for today. He mentioned some of the exciting science results of the Gaia mission. Of course, that is a minute sample taken from the thousands and thousands of papers that would not exist without Gaia.

    • The tidal bridge of stars between the LMC and the SMC.
    • The discovery of 160'000 asteroids (with 5.5 years of data analysed), and their rough spectra, allowing us to group them into several classes.
    • The high-precision reference catalogue which is now in use everywhere to astrometrically calibrate astronomical images; a small pre-release of this was already in use for the navigation of the extended mission of the Pluto probe New Horizons.
    • Finding the young stars in the (wider) solar neighbourhood by their over-luminosity in the colour-magnitude diagram, which lets you accurately map star forming regions out to a few hundred parsecs.
    • Unraveling the history of the Milky Way by reconstructing orbits of hundreds of millions of stars and identifying stellar streams (or rather, overdensities in the space of orbital elements) left over by mergers of other galaxies with the Milky Way and preserved over 10 billion years.
    • Confirming that the oldest stars in the Mikly Way are indeed in the bulge using the XP spectra, and reconstructing how the disk formed afterwards.
    • In the vertical motions of the disk stars, there is a clear signal of a recent perturbation (probably when the Sagittarius dwarf galaxy crashed through the Milky Way disk) and how there is now some sort of wave going through the disk and slowly petering out.
    • Certain white dwarfs (I think those consisting of carbon and nitrogen) show underluminosities because they form bizarre crystals in their outer regions (or so; I didn't quite get that part).
    • Thousands of star clusters newly discovered (and a few suspected star clusters debunked). One new discovery was actually hiding behind Sirius; it took space observations and very careful data reduction around bright sources to see it in the vicinity of this source overshining everything around it.
    • Quite a few binary stars having neutron stars or black holes as companions – where we are still not sure how some of these systems can even form.
    • Acceleration of the solar system: The sun orbits the centre of the Milky Way, once every about 220 Million years or so. So, it does not move linearly, but only very slightly so (“2 Angstrom/s²” acceleration, Anthony put it). Gaia's breathtaking precision let us measure that number for the first time.
    • Oh, and in DR4, we will see probably 1000s of new exoplanets in a mass-period range not well sampled so far: Giant planets in wide orbits.
    • And in DR5, there will even be limits on low-frequency gravitational waves.

    Incidentally, in the question session after Anthony's talk, the grandmaster of both Hipparcos and Gaia, Erik Høg, reminded everyone of the contributions by Russian astronomers to Gaia, among other things having proposed the architecture of the scanning CCDs. I personally have to say that I am delighted to be reminded of how science overcomes the insanities of politics and nationalism.

  • The Case of the Disappearing Bits

    [number line with location markers]

    Every green line in this image stands for a value exactly representable in a floating point value of finite size. As you see, it's a white area out there [source]

    While I was preparing the publication of Coryn Bailer-Jones' distance estimations based on Gaia eDR3 (to be released about tomorrow), Coryn noticed I was swallowing digits from his numbers. My usual reaction of “aw, these are meaningless anyway because your errors are at least an order of magnitude higher” didn't work this time, because Gaia is such an incredible machine that some of the values really have six significant decimal digits. For an astronomical distance! If I had a time machine, I'd go back to F.W. Bessel right away to make him pale in envy.

    I'm storing these distances as PostgreSQL REALs, so these six digits are perilously close the seven decimal digits that the 23 bits of mantissa of single precision IEEE 754 floats are usually translated to. Suddenly, being cavalier with the last few bits of the mantissa isn't just a venial sin. It will lose science.

    So, I went hunting for the bits, going from parsing (in this case C's sscanf) through my serialisation into Postgres binary copy material (DaCHS operators: this is using a booster) to pulling the material out of the database again. And there I hit it: the bits disappeared between copying them in and retrieving them from the database.

    Wow. Turns out: It's a feature. And one I should have been aware of in that Postgres' docs have a prominent warning box where it explains its floating point types: Without setting extra-float-digits it will cut off bits. And it's done this ever since the dawn of DaCHS (in postgres terms, version 8.2 or so).

    Sure enough (edited for brevity):

    gavo=$ select r_med_geo from gedr3dist.main
    gavo-$ where source_id=563018673253120;
        1430.9
    
    gavo=$ set extra_float_digits=3;
    gavo=$ select r_med_geo from gedr3dist.main
    gavo-$ where source_id=563018673253120;
     1430.90332
    

    Starting with its database schema 26 (which is the second part of the output of dachs --version), DaCHS will configure its database roles always have extra_float_digits 3; operators beware: this may break your regression tests after the next upgrade.

    If you want to configure your non-DaCHS role, too, all it takes is:

    alter role (you) set extra_float_digits=3,
    

    You could also make the entire database or even the entire cluster behave like that; but then losing these bits isn't always a bad idea: It really makes the floats prettier while most of the time not losing significant data. It's just when you want to preserve the floats as you get them – and with science data, that's mostly a good idea – that we just can't really afford that prettyness.

    Update (2021-04-22): It turns out that this was already wrong (for some meaning of wrong) when I wrote this. Since PostgreSQL 12, Postgres uses shortest-precise by default (and whenever extra_float_digits is positive). The official documentation has a nice summary of the problem and the way post-12 postgres addresses it. So: expect your float-literal-comparing regression tests to break after the upgrade to bullseye.

  • Parallel Queries

    Image: Plot of run times

    An experiment with parallel querying of PPMX, going from single-threaded execution to using seven workers.

    Let me start this post with a TL;DR for

    scientists:
    Large analysis queries (like those that contain a GROUP BY clause) profit a lot from parallel execution, and you needn't do a thing for that.
    DaCHS operators:
    When you have large tables, Postgres 11 together with the next DaCHS release may speed up your responses quite dramatically in some cases.

    So, here's the story –

    I've finally overcome my stretch trauma and upgraded the Heidelberg data center's database server to Debian buster. With that, I got Postgres 11, and I finally bothered to look into what it takes to enable parallel execution of database queries.

    Turns out: My Postgres started to do parallel execution right away, but just in case, I went for the following lines in postgresql.conf:

    max_parallel_workers_per_gather = 4
    max_worker_processes = 10
    max_parallel_workers = 10
    

    Don't quote me on this – I frankly admit I haven't really developed a feeling for the consequences of max_parallel_workers_per_gather and instead just did some experiments while the box was loaded otherwise, determining where raising that number has a diminishing return (see below for more on this).

    The max_worker_processes thing, on the other hand, is an educated guess: on my data center, there's essentially never more than one person at a time who's running “interesting”, long-running queries (i.e., async), and that person should get the majority of the execution units (the box has 8 physical CPUs that look like 16 cores due to hyperthreading) because all other operations are just peanuts in comparison. I'll gladly accept advice to the effect that that guess isn't that educated after all.

    Of course, that wasn't nearly enough. You see, since TAP queries can return rather large result sets – on the GAVO data center, the match limit is 16 million rows, which for a moderate row size of 2 kB already translates to 32 GB of memory use if pulled in at once, half the physical memory of that box –, DaCHS uses cursors (if you're a psycopg2 person: named cursors) to stream results and write them out to disk as they come in.

    Sadly, postgres won't do parallel plans if it thinks people will discard a large part of the result anyway, and it thinks that if you're coming through a cursor. So, in SVN revision 7370 of DaCHS (and I'm not sure if I'll release that in this form), I'm introducing a horrible hack that, right now, just checks if there's a literal “group” in the query and doesn't use a cursor if so. The logic is, roughly: With GROUP, the result set probably isn't all that large, so streaming isn't that important. At the same time, this type of query is probably going to profit from parallel execution much more than your boring sequential scan.

    This gives rather impressive speed gains. Consider this example (of course, it's selected to be extreme):

    import contextlib
    import pyvo
    import time
    
    @contextlib.contextmanager
    def timeit(activity):
      start_time = time.time()
      yield
      end_time = time.time()
      print("Time spent on {}: {} s".format(activity, end_time-start_time))
    
    
    svc = pyvo.tap.TAPService("http://dc.g-vo.org/tap")
    with timeit("Cold (?) run"):
      svc.run_sync("select round(Rmag) as bin, count(*) as n"
        " from ppmx.data group by bin")
    with timeit("Warm run"):
      svc.run_sync("select round(Rmag) as bin, count(*) as n"
        " from ppmx.data group by bin")
    

    (if you run it yourself and you get warnings about VOTable versions from astropy, ignore them; I'm right and astropy is wrong).

    Before enabling parallel execution, this was 14.5 seconds on a warm run, after, it was 2.5 seconds. That's an almost than a 6-fold speedup. Nice!

    Indeed, that holds beyond toy examples. The showcase Gaia density plot:

    SELECT
            count(*) AS obs,
            source_id/140737488355328 AS hpx
    FROM gaia.dr2light
    GROUP BY hpx
    

    (the long odd number is 235416-6, which turns source_ids into level 6-HEALPixes as per Gaia footnote id; please note that Postgres right now isn't smart enough to parallelise ivo_healpix), which traditionally ran for about an hour is now done in less than 10 minutes.

    In case you'd like to try things out on your postgres, here's what I've done to establish the max_parallel_workers_per_gather value above.

    1. Find a table with a few 1e7 rows. Think of a query that will return a small result set in order to not confuse the measurements by excessive client I/O. In my case, that's a magnitude histogram, and the query would be:

      select round(Rmag) as bin, count(*) as n from ppmx.data group by bin;

      Run this query once so the data is in the disk cache (the query is “warm”).

    2. Establish a non-parallel baseline. That's easy to do:

      set max_parallel_workers_per_gather=0;
      
    3. Then run:

      explain analyze select round(Rmag) as bin, count(*) as n from ppmx.data group by bin;
      

      You should see a simple query plan with the runtime for the non-parallel execution – in my case, a bit more than 12 seconds.

    4. Then raise the number of max_parallel_workers_per_gatherer successively. Make sure the query plan has lines of the form “Workers Planned” or so. You should see that the execution time falls with the number of workers you give it, up to the value of max_worker_processes – or until postgres decides your table is too small to warrant further parallelisation, which for my settings happened at 7.

    Note, though, that in realistic, more complex queries, there will probably be multiple operations that will profit from parallelisation in a single query. So, if in this trivial example you can go to 15 gatherers and still see an improvement, this could actually make things slower for complex queries. But as I said above: I have no instinct yet for how things will actually work out. If you have experiences to share: I'm sure I'm not the only person on dachs-users who't be interested.

    Update 2022-05-17: In Postgres 13, I found that the planner disfavours parallel plans a lot stronger than I think it has in Postgres 11. To make up for that, I've amended my postgres configuration (in /etc/postgresql/13/main/postgresql.conf) with the slightly bizarre:

    parallel_tuple_cost = 0.001
    parallel_setup_cost = 3
    

    This is certainly not ideal for every workload, but given the queries I see in the VO I want to give Postgres no excuse not to parallelise when there is at least the shard of a chance it'll help; given I'll never execute more than very few queries per second, the extra overhead for parallelising queries that would be faster sequentially will never really bite me.

Page 1 / 1