テーブル作成

create table shops (
  shop_id    char(5) not null,
  shop_name  varchar(64),
  area       varchar(64),
   constraint pk_shops primary key (shop_id));

create table items (
  item_id       char(5)  not null, 
  shop_id       char(5) not null,
  item_name     char(32) not null, 
  price         integer  not null, 
  primary key (item_id, shop_id));

データ投入

insert into shops values('10000', 'オレンジストア', '東京');
insert into shops values('20000', 'コンビニ作之助', '埼玉');

insert into items values('10001', '10000', 'pc', 40000);
insert into items values('10002', '10000', 'モニタ', 12000);
insert into items values('10003', '10000', 'マウス', 5000);

insert into items values('20001', '20000', 'ジュース', 120);
insert into items values('20002', '20000', 'あんまん', 100);
insert into items values('20003', '20000', 'チョコ', 150);

こんな感じ。

hogedb=> select * from shops;
 shop_id |   shop_name    | area 
---------+----------------+------
 10000   | オレンジストア | 東京
 20000   | コンビニ作之助 | 埼玉
(2 rows)

hogedb=> select * from items;
 item_id | shop_id |              item_name               | price 
---------+---------+--------------------------------------+-------
 10001   | 10000   | pc                                   | 40000
 10002   | 10000   | モニタ                               | 12000
 10003   | 10000   | マウス                               |  5000
 20001   | 20000   | ジュース                             |   120
 20002   | 20000   | あんまん                             |   100
 20003   | 20000   | チョコ                               |   150
(6 rows)

実行計画確認

統計情報

スキーマ確認

hogedb=> select current_schema;
 current_schema 
----------------
 public
(1 row)

統計情報更新

hogedb=> analyze;

或いはスキーマとテーブルを指定

analyze public.shops;

統計情報がいつ更新されたか確認

select relname
     , n_live_tup
     , n_dead_tup
     , last_autovacuum
     , last_analyze
     , last_autoanalyze
  from pg_stat_all_tables
 where schemaname = 'public' -- ここにスキーマ
 order by
       relname
;

以下にて確認

 relname | n_live_tup | n_dead_tup | last_autovacuum |         last_analyze          | last_autoanalyze 
---------+------------+------------+-----------------+-------------------------------+------------------
 items   |          6 |          0 |                 | 2019-11-10 15:02:05.138966+09 | 
 shops   |          2 |          0 |                 | 2019-11-10 15:02:05.122272+09 | 
(2 rows)

実行計画確認

フルスキャン

hogedb=> explain select * from shops;
                      QUERY PLAN                      
------------------------------------------------------
 Seq Scan on shops  (cost=0.00..1.02 rows=2 width=35)
(1 row)

以下の結合ではHashとなる。Nested Loops が見たかったが。

hogedb=> select s.shop_name
hogedb->      , i.item_name
hogedb->   from shops s
hogedb->  inner join
hogedb->        items i
hogedb->     on s.shop_id = i.shop_id;
   shop_name    |              item_name               
----------------+--------------------------------------
 オレンジストア | pc                              
 オレンジストア | モニタ                             
 オレンジストア | マウス                             
 コンビニ作之助 | ジュース                            
 コンビニ作之助 | あんまん                            
 コンビニ作之助 | チョコ                             
(6 rows)

実行計画

hogedb=> explain select s.shop_name
hogedb->      , i.item_name
hogedb->   from shops s
hogedb->  inner join
hogedb->        items i
hogedb->     on s.shop_id = i.shop_id;
                             QUERY PLAN                             
--------------------------------------------------------------------
 Hash Join  (cost=1.04..2.15 rows=6 width=60)
   Hash Cond: (i.shop_id = s.shop_id)
   ->  Seq Scan on items i  (cost=0.00..1.06 rows=6 width=44)
   ->  Hash  (cost=1.02..1.02 rows=2 width=28)
         ->  Seq Scan on shops s  (cost=0.00..1.02 rows=2 width=28)
(5 rows)

実行計画は以下の順に処理される事をあらわしています。

shops をフルスキャン
- ネストの深いところから先に実行
- 小さい方（rows=2）からHashを作る。
Hash を作る。
items をフルスキャンし、shop_id でHash Join

この例では無いですが、

ツリー表示で同じインデントの階層の上に位置するテーブルが先に実行されます。

また、Nested Loops の場合は、内部表に結合キーで一意になるようにする。

駆動表が小さく。

内部表が大きいと効率が良い。

内部表に、結合キーで一意にアクセスできるなら、内部表が何億件レコードであろうと一発で返せるからです。

do read tableA -- 駆動表

  do read tableB -- 内部表

    ここで一発で内部表のループが終わる。内部表がどんなに大きくても。

  end do

end do

実行計画の比較

hogedb=> select * from items;
 item_id | shop_id |              item_name               | price 
---------+---------+--------------------------------------+-------
 10001   | 10000   | pc                                   | 40000
 10002   | 10000   | モニタ                               | 12000
 10003   | 10000   | マウス                               |  5000
 20001   | 20000   | ジュース                             |   120
 20002   | 20000   | あんまん                             |   100
 20003   | 20000   | チョコ                               |   150
(6 rows)

このテーブルから、shop_idが20000の店舗に値上げを指示します。

select item_id
     , shop_id
     , item_name
     , price
  from items
 where shop_id = '10000'
 union all
 select item_id
     , shop_id
     , item_name
     , price * 1.5
  from items
 where shop_id = '20000'
;
 item_id | shop_id |              item_name               | price 
---------+---------+--------------------------------------+-------
 10001   | 10000   | pc                                   | 40000
 10002   | 10000   | モニタ                               | 12000
 10003   | 10000   | マウス                               |  5000
 20001   | 20000   | ジュース                             | 180.0
 20002   | 20000   | あんまん                             | 150.0
 20003   | 20000   | チョコ                               | 225.0
(6 rows)

このクエリの実行計画は以下の通りです。

                               QUERY PLAN                               
------------------------------------------------------------------------
 Append  (cost=0.00..2.23 rows=6 width=212)
   ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..1.11 rows=3 width=82)
         ->  Seq Scan on items  (cost=0.00..1.07 rows=3 width=54)
               Filter: (shop_id = '10000'::bpchar)
   ->  Seq Scan on items items_1  (cost=0.00..1.09 rows=3 width=82)
         Filter: (shop_id = '20000'::bpchar)
(6 rows)

Seq というのがフルスキャンの事なのですが、２回発生していることがわかります。

ディスクI/Oはコストになるので以下のクエリに変えてみます。

select item_id
     , shop_id
     , item_name
     , case when shop_id = '20000' then price * 1.5
            else price end
  from items
;

最初のクエリと同じ結果が得られます。

実行計画は以下の通りです。

                      QUERY PLAN                      
------------------------------------------------------
 Seq Scan on items  (cost=0.00..1.12 rows=6 width=82)
(1 row)

フルスキャンが１回に減っています。

これは簡単な例なので実行計画を見なくてもunionを選択することはないと思うのですが、実行計画を読めると嬉しいことの１つはこういう事なのかなと思いました。

スキャン多いよね、、減らす方法を考えようと思うことができます。