Warum erhöht das Hinzufügen einer smallint-Spalte zu einer Postgres-Abfrage die Ausführungszeit um das 60-fache?

Hier habe ich eine Routine-Tabelle und ein Skript zur Tabellenerstellung, die ich in /tmp/try.sql eingefügt habe. Ich verwende Postgres unter OSX 10.10.2.

Hier ist die Tabelle:

cow_dev=# \d carc
                                Table "public.carc"
    Column     |     Type     |                         Modifiers
---------------+--------------+---------------------------------------------------   ---------
 internal_id   | integer      | not null default nextval('carc_internal_id_seq'::regclass)
 acct_nbr      | character(6) |
 birth_date    | date         |
 anm_key       | integer      |
 slghtr_dt     | date         |
 sire_assoc_id | integer      |
 sire_reg      | text         |
 dam_assoc_id  | integer      |
 dam_reg       | text         |
 sex           | text         |
 carc_kphf_pct | real         |
 carc_wt       | integer      |
 marbling      | integer      |
 ribeye_area   | real         |
 usda_qlty_grd | smallint     |
 act_fat_thick | real         |
 carcass_group | text         |
 maturity      | integer      |
Indexes:
    "idx_a649568319866892fcdd3742289e8294" PRIMARY KEY, btree (internal_id)
    "idx_d4659e9f750ff68c75e11e89a950e386" btree (anm_key)
    "idx_d57025002b9ce54cf590b76e87c10cac" btree (acct_nbr)
Foreign-key constraints:
    "carc_acct_nbr__acct_acct_nbr_fk" FOREIGN KEY (acct_nbr) REFERENCES acct(acct_nbr)
    "carc_anm_key__anm_anm_key_fk" FOREIGN KEY (anm_key) REFERENCES anm(anm_key)

Hier ist das Skript:

create temp table carc_out as
     with assoc_map as 
     (select
         a.assoc_id,
         c.code_3 || a.brd_cd_id mb_name
      from
         assoc a
         join country_code c on c.code_2 = a.country_code_2)
     select
        c.internal_id,
        n.mb_assoc_reg anm_nbr,
        c.act_fat_thick,
        c.carc_kphf_pct,
        c.carcass_group,
        c.carc_wt,
        null::float carc_yld,
        c.marbling,
        c.maturity,
        c.ribeye_area,
        to_char(c.slghtr_dt, 'mmddyyyy') slghtr_dt,
        c.usda_qlty_grd
from
        carc c
        left join z2_raa_numbers n using (anm_key)
        left join assoc_map da on da.assoc_id = coalesce(c.dam_assoc_id, 1) 
        left join assoc_map sa on sa.assoc_id = coalesce(c.sire_assoc_id, 1);
drop table carc_out;

Hier sind die Ergebnisse der zweimaligen Ausführung, wobei das erste mit dem c.usda_qlty_grd nicht in der Abfrage enthalten ist und das zweite mit dem darin enthaltenen.

> time psql -U cow_peer -d cow_dev -c '\i /tmp/try.sql'
SELECT 25332
DROP TABLE

real    0m3.041s
user    0m0.004s
sys     0m0.004s
10:57:46 521 0 tom@angus-2 ~
> time psql -U cow_peer -d cow_dev -c '\i /tmp/try.sql'
SELECT 25332
DROP TABLE

real    3m1.958s
user    0m0.004s
sys     0m0.004s

Wie Sie sehen können, dauert es einmal 3 Sekunden und das andere Mal 3 Minuten. Dies ist wiederholbar und zumindest etwas unabhängig davon, welche zusätzliche Spalte Sie der Abfrage hinzufügen. Interessanterweise scheinen sich die Benutzer- und Systemzeiten nicht zu ändern. Die Ausführungspläne sind identisch. Ich bin ratlos.

Hier ist der Abfrageplan ohne Spalte:

 QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=1220.51..122913.55 rows=25332 width=54) (actual time=48.925..3254.720 rows=25332 loops=1)
   Output: c.internal_id, n.mb_assoc_reg, c.act_fat_thick, c.carc_kphf_pct, c.carcass_group, c.carc_wt, NULL::double precision, c.marbling, c.maturity, c.ribeye_area, to_char((c.slghtr_dt)::timestamp with time zone, 'mmddyyyy'::text)
   Hash Cond: (COALESCE(c.sire_assoc_id, 1) = sa.assoc_id)
   CTE assoc_map
     ->  Hash Join  (cost=7.54..10.16 rows=52 width=11) (actual time=0.271..0.392 rows=52 loops=1)
           Output: a.assoc_id, ((c_1.code_3)::text || (a.brd_cd_id)::text)
           Hash Cond: (a.country_code_2 = c_1.code_2)
           ->  Seq Scan on public.assoc a  (cost=0.00..1.52 rows=52 width=10) (actual time=0.003..0.012 rows=52 loops=1)
                 Output: a.assoc_id, a.assoc_code, a.priority, a.csu_priority, a.mb_priority, a.description, a.country_code_2, a.brd_cd_id
           ->  Hash  (cost=4.46..4.46 rows=246 width=7) (actual time=0.250..0.250 rows=246 loops=1)
                 Output: c_1.code_3, c_1.code_2
                 Buckets: 1024  Batches: 1  Memory Usage: 10kB
                 ->  Seq Scan on public.country_code c_1  (cost=0.00..4.46 rows=246 width=7) (actual time=0.004..0.114 rows=246 loops=1)
                       Output: c_1.code_3, c_1.code_2
   ->  Hash Left Join  (cost=1208.66..122614.19 rows=25332 width=58) (actual time=48.867..3220.637 rows=25332 loops=1)
         Output: c.internal_id, c.act_fat_thick, c.carc_kphf_pct, c.carcass_group, c.carc_wt, c.marbling, c.maturity, c.ribeye_area, c.slghtr_dt, c.sire_assoc_id, n.mb_assoc_reg
         Hash Cond: (COALESCE(c.dam_assoc_id, 1) = da.assoc_id)
         ->  Hash Right Join  (cost=1206.97..122451.64 rows=25332 width=62) (actual time=48.380..3204.285 rows=25332 loops=1)
               Output: c.internal_id, c.act_fat_thick, c.carc_kphf_pct, c.carcass_group, c.carc_wt, c.marbling, c.maturity, c.ribeye_area, c.slghtr_dt, c.dam_assoc_id, c.sire_assoc_id, n.mb_assoc_reg
               Hash Cond: (n.anm_key = c.anm_key)
               ->  Seq Scan on public.z2_raa_numbers n  (cost=0.00..61034.69 rows=3330869 width=19) (actual time=0.057..937.277 rows=3330869 loops=1)
                     Output: n.mb_assoc_reg, n.anm_key
               ->  Hash  (cost=642.32..642.32 rows=25332 width=51) (actual time=46.444..46.444 rows=25332 loops=1)
                     Output: c.internal_id, c.act_fat_thick, c.carc_kphf_pct, c.carcass_group, c.carc_wt, c.marbling, c.maturity, c.ribeye_area, c.slghtr_dt, c.anm_key, c.dam_assoc_id, c.sire_assoc_id
                     Buckets: 2048  Batches: 128 (originally 4)  Memory Usage: 1025kB
                     ->  Seq Scan on public.carc c  (cost=0.00..642.32 rows=25332 width=51) (actual time=0.007..18.372 rows=25332 loops=1)
                           Output: c.internal_id, c.act_fat_thick, c.carc_kphf_pct, c.carcass_group, c.carc_wt, c.marbling, c.maturity, c.ribeye_area, c.slghtr_dt, c.anm_key, c.dam_assoc_id, c.sire_assoc_id
         ->  Hash  (cost=1.04..1.04 rows=52 width=4) (actual time=0.461..0.461 rows=52 loops=1)
               Output: da.assoc_id
               Buckets: 1024  Batches: 1  Memory Usage: 2kB
               ->  CTE Scan on assoc_map da  (cost=0.00..1.04 rows=52 width=4) (actual time=0.275..0.436 rows=52 loops=1)
                     Output: da.assoc_id
   ->  Hash  (cost=1.04..1.04 rows=52 width=4) (actual time=0.042..0.042 rows=52 loops=1)
         Output: sa.assoc_id
         Buckets: 1024  Batches: 1  Memory Usage: 2kB
         ->  CTE Scan on assoc_map sa  (cost=0.00..1.04 rows=52 width=4) (actual time=0.001..0.018 rows=52 loops=1)
               Output: sa.assoc_id
 Total runtime: 3256.486 ms
(38 rows)

Hier ist der Abfrageplan mit der Spalte:

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=1220.51..122913.55 rows=25332 width=56) (actual time=11588.296..179838.116 rows=25332 loops=1)
   Output: c.internal_id, n.mb_assoc_reg, c.act_fat_thick, c.carc_kphf_pct, c.carcass_group, c.carc_wt, NULL::double precision, c.marbling, c.maturity, c.ribeye_area, to_char((c.slghtr_dt)::timestamp with time zone, 'mmddyyyy'::text), c.usda_qlty_grd
   Hash Cond: (COALESCE(c.sire_assoc_id, 1) = sa.assoc_id)
   CTE assoc_map
     ->  Hash Join  (cost=7.54..10.16 rows=52 width=11) (actual time=0.211..0.314 rows=52 loops=1)
           Output: a.assoc_id, ((c_1.code_3)::text || (a.brd_cd_id)::text)
           Hash Cond: (a.country_code_2 = c_1.code_2)
           ->  Seq Scan on public.assoc a  (cost=0.00..1.52 rows=52 width=10) (actual time=0.002..0.015 rows=52 loops=1)
                 Output: a.assoc_id, a.assoc_code, a.priority, a.csu_priority, a.mb_priority, a.description, a.country_code_2, a.brd_cd_id
           ->  Hash  (cost=4.46..4.46 rows=246 width=7) (actual time=0.186..0.186 rows=246 loops=1)
                 Output: c_1.code_3, c_1.code_2
                 Buckets: 1024  Batches: 1  Memory Usage: 10kB
                 ->  Seq Scan on public.country_code c_1  (cost=0.00..4.46 rows=246 width=7) (actual time=0.003..0.065 rows=246 loops=1)
                       Output: c_1.code_3, c_1.code_2
   ->  Hash Left Join  (cost=1208.66..122614.19 rows=25332 width=60) (actual time=11588.203..179739.516 rows=25332 loops=1)
         Output: c.internal_id, c.act_fat_thick, c.carc_kphf_pct, c.carcass_group, c.carc_wt, c.marbling, c.maturity, c.ribeye_area, c.slghtr_dt, c.usda_qlty_grd, c.sire_assoc_id, n.mb_assoc_reg
         Hash Cond: (COALESCE(c.dam_assoc_id, 1) = da.assoc_id)
         ->  Hash Right Join  (cost=1206.97..122451.64 rows=25332 width=64) (actual time=11587.812..179701.171 rows=25332 loops=1)
               Output: c.internal_id, c.act_fat_thick, c.carc_kphf_pct, c.carcass_group, c.carc_wt, c.marbling, c.maturity, c.ribeye_area, c.slghtr_dt, c.usda_qlty_grd, c.dam_assoc_id, c.sire_assoc_id, n.mb_assoc_reg
               Hash Cond: (n.anm_key = c.anm_key)
               ->  Seq Scan on public.z2_raa_numbers n  (cost=0.00..61034.69 rows=3330869 width=19) (actual time=0.170..1052.251 rows=3330869 loops=1)
                     Output: n.mb_assoc_reg, n.anm_key
               ->  Hash  (cost=642.32..642.32 rows=25332 width=53) (actual time=96.305..96.305 rows=25332 loops=1)
                     Output: c.internal_id, c.act_fat_thick, c.carc_kphf_pct, c.carcass_group, c.carc_wt, c.marbling, c.maturity, c.ribeye_area, c.slghtr_dt, c.usda_qlty_grd, c.anm_key, c.dam_assoc_id, c.sire_assoc_id
                     Buckets: 2048  Batches: 65536 (originally 4)  Memory Usage: 1028kB
                     ->  Seq Scan on public.carc c  (cost=0.00..642.32 rows=25332 width=53) (actual time=0.007..15.858 rows=25332 loops=1)
                           Output: c.internal_id, c.act_fat_thick, c.carc_kphf_pct, c.carcass_group, c.carc_wt, c.marbling, c.maturity, c.ribeye_area, c.slghtr_dt, c.usda_qlty_grd, c.anm_key, c.dam_assoc_id, c.sire_assoc_id
         ->  Hash  (cost=1.04..1.04 rows=52 width=4) (actual time=0.371..0.371 rows=52 loops=1)
               Output: da.assoc_id
               Buckets: 1024  Batches: 1  Memory Usage: 2kB
               ->  CTE Scan on assoc_map da  (cost=0.00..1.04 rows=52 width=4) (actual time=0.214..0.354 rows=52 loops=1)
                     Output: da.assoc_id
   ->  Hash  (cost=1.04..1.04 rows=52 width=4) (actual time=0.045..0.045 rows=52 loops=1)
         Output: sa.assoc_id
         Buckets: 1024  Batches: 1  Memory Usage: 2kB
         ->  CTE Scan on assoc_map sa  (cost=0.00..1.04 rows=52 width=4) (actual time=0.001..0.018 rows=52 loops=1)
               Output: sa.assoc_id
 Total runtime: 179844.270 ms
(38 rows)

postgresql user3416742
quelle

Das ist wirklich seltsam. Es ist nur das hash right join, was so viel länger dauert. Ich frage mich, ob die Anzahl der Spalten einen Schwellenwert überschreitet, der dies verursacht. Sie sollten dies wahrscheinlich in die Postgres-Performance-Mailingliste aufnehmen.

a_horse_with_no_name

@a_horse_with_no_name Auch das Hashing, das diesem Join vorausgeht, hat eine enorme Anzahl von Stapeln. Es scheint, dass dies work_memim zweiten Fall sehr einschränkend ist - keine Ahnung warum (wenn die Tupelbreitenschätzung, wenn der Planer korrekt ist).

Dekso

Sieht aus wie ein Fehler in der Hash-Batch-Berechnung. Welche Version von PostgreSQL, einschließlich der Veröffentlichung kleinerer Punkte, wird ausgeführt?

jjanes

cow_dev = # select version (); Ausführung ------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------- PostgreSQL 9.3.5 auf x86_64-apple-darwin, kompiliert von i686-apple-darwin11-llvm-gcc-4.2 (GCC) 4.2.1 (Basierend auf Apple Inc. Build 5658) (LLVM Build 2336.9.00), 64-Bit (1 Zeile)

Benutzer 3416742

Sie sollten mindestens auf die neueste Version Ihrer Hauptversion aktualisieren, die derzeit 9.3.10 ist. Details: postgresql.org/support/versioning Und prüfen Sie, ob das Problem dadurch behoben wird .

Erwin Brandstetter

Warum erhöht das Hinzufügen einer smallint-Spalte zu einer Postgres-Abfrage die Ausführungszeit um das 60-fache?

Antworten: