2024 年我们一个 PostgreSQL 14 集群,数据量从 200GB 涨到 2TB 后慢查询泛滥。一个简单订单聚合查询从 200ms 涨到 30s,影响报表系统。投了一周做索引 + 分区 + 查询重写,P99 从 30s 降到 800ms,慢查询数量减少 95%。本文复盘 PostgreSQL 慢查询排查 + 优化全流程。
问题背景
数据库:PostgreSQL 14.10
机器:16C 64G,SSD,主从 1:2
表:orders(2 亿行,500GB),order_items(8 亿行,1.2TB)
报表查询(每天 100 次):
SELECT
DATE(created_at) AS day,
product_category,
COUNT(*) AS orders,
SUM(total_amount) AS gmv
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.created_at >= '2024-01-01'
AND o.status = 'paid'
GROUP BY 1, 2;
执行时间:
- 数据 200GB:200ms
- 数据 1TB:8s
- 数据 2TB:30s(报表超时)
需要从根本上优化
第一步:开 slow log + auto_explain
-- postgresql.conf
log_min_duration_statement = 1000 -- 超过 1s 的 SQL 记日志
log_statement = 'mod' -- 记录 INSERT/UPDATE/DELETE
log_line_prefix = '%t [%p]: db=%d user=%u app=%a host=%h '
-- auto_explain:自动记录慢查询执行计划
shared_preload_libraries = 'auto_explain, pg_stat_statements'
auto_explain.log_min_duration = '1s'
auto_explain.log_analyze = true
auto_explain.log_buffers = true
auto_explain.log_timing = true
auto_explain.log_triggers = true
auto_explain.log_verbose = true
auto_explain.log_nested_statements = true
-- 重启生效
$ pg_ctl restart
-- 查看慢日志
$ tail -f /var/log/postgresql/postgresql-14-main.log
2024-01-15 14:23:11 [12345]: db=app user=app duration: 28453.234 ms
plan:
Query Text: SELECT ...
Sort (cost=12345.67..23456.78 rows=1000)
-> Hash Join (cost=...)
-> Seq Scan on order_items (rows=800000000)
第二步:pg_stat_statements 找罪魁
-- 创建扩展
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- TOP 10 慢查询(按总耗时)
SELECT
query,
calls,
total_exec_time / 1000 AS total_seconds,
mean_exec_time AS mean_ms,
rows / calls AS rows_per_call
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
-- TOP 10 慢查询(按平均耗时)
SELECT
query,
calls,
mean_exec_time AS mean_ms,
max_exec_time AS max_ms,
stddev_exec_time AS stddev_ms
FROM pg_stat_statements
WHERE calls > 100
ORDER BY mean_exec_time DESC
LIMIT 10;
-- 找出来 3 个嫌疑:
-- 1. 订单聚合(30s)
-- 2. 商品全文搜索(15s)
-- 3. 用户行为时间区间扫描(10s)
第三步:EXPLAIN ANALYZE 看执行计划
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT
DATE(created_at) AS day,
product_category,
COUNT(*) AS orders,
SUM(total_amount) AS gmv
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.created_at >= '2024-01-01'
AND o.status = 'paid'
GROUP BY 1, 2;
-- 输出:
Sort (cost=... rows=1000 width=...) (actual time=28453..28454 rows=120)
-> HashAggregate (cost=...)
Group Key: date(o.created_at), oi.product_category
-> Hash Join (actual time=2341..27123 rows=15000000)
Hash Cond: (oi.order_id = o.id)
-> Seq Scan on order_items oi (rows=800000000) ← 全表扫
Filter: ...
-> Hash (rows=10000000)
-> Index Scan using idx_orders_created_at on orders o
(rows=10000000)
Index Cond: (created_at >= '2024-01-01')
Filter: (status = 'paid')
Planning Time: 2.3 ms
Execution Time: 28456.7 ms
-- 问题:
-- 1. order_items 800M 行全扫(没有合适索引)
-- 2. 30 万行结果 sort,内存爆
-- 3. status='paid' 在 Filter,没用索引
优化 1:补索引
-- 复合索引:筛选 + 排序 + 覆盖
CREATE INDEX CONCURRENTLY idx_orders_paid_created
ON orders (status, created_at)
WHERE status IN ('paid', 'shipped', 'completed');
-- WHERE 条件让索引变小(局部索引)
-- order_items 缺 order_id 索引
CREATE INDEX CONCURRENTLY idx_order_items_order_id
ON order_items (order_id) INCLUDE (product_category, quantity);
-- INCLUDE:覆盖索引,JOIN 后不回表
-- 验证
EXPLAIN (ANALYZE, BUFFERS) ...
-- 30s → 8s
-- BUT 还是慢,因为数据量太大,JOIN 后还得 group
优化 2:分区
-- 把 orders 按月分区(2 年数据 = 24 个分区)
-- 1. 重命名旧表
ALTER TABLE orders RENAME TO orders_old;
-- 2. 创建分区主表
CREATE TABLE orders (
id BIGSERIAL,
user_id BIGINT NOT NULL,
created_at TIMESTAMPTZ NOT NULL,
status VARCHAR(20),
total_amount DECIMAL(12,2),
-- ...
PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);
-- 3. 自动建分区(pg_partman)
SELECT create_parent('public.orders', 'created_at', 'native', 'monthly');
-- 或手工:
CREATE TABLE orders_2024_01 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE orders_2024_02 PARTITION OF orders
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- ...
-- 4. 迁移数据(分批,避免锁)
INSERT INTO orders SELECT * FROM orders_old
WHERE created_at >= '2024-01-01' AND created_at < '2024-02-01';
-- ...
-- 5. 查询自动 partition pruning
EXPLAIN ANALYZE
SELECT ... FROM orders WHERE created_at >= '2024-01-01' AND created_at < '2024-02-01';
-- 只扫一个分区,8s → 1.5s
优化 3:物化视图
-- 报表场景:数据每天更新一次,用物化视图缓存
CREATE MATERIALIZED VIEW mv_daily_gmv AS
SELECT
DATE(o.created_at) AS day,
oi.product_category,
COUNT(*) AS orders,
SUM(o.total_amount) AS gmv
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE o.status = 'paid'
GROUP BY 1, 2;
-- 加唯一索引(REFRESH CONCURRENTLY 需要)
CREATE UNIQUE INDEX idx_mv_daily_gmv_pk
ON mv_daily_gmv (day, product_category);
-- 每天凌晨刷新(不阻塞查询)
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_gmv;
-- 查询直接打物化视图
SELECT * FROM mv_daily_gmv
WHERE day >= '2024-01-01';
-- 800ms → 50ms(直接读预计算结果)
优化 4:查询改写
-- 改写 1:OR 改 UNION ALL
-- 慢
SELECT * FROM orders
WHERE user_id = 123 OR status = 'paid';
-- 走不了索引
-- 快
SELECT * FROM orders WHERE user_id = 123
UNION ALL
SELECT * FROM orders WHERE status = 'paid' AND user_id != 123;
-- 两个分别走索引
-- 改写 2:子查询 → JOIN
-- 慢
SELECT * FROM orders
WHERE id IN (SELECT order_id FROM order_items WHERE product_id = 999);
-- 大数据量下走 hash subquery,慢
-- 快
SELECT DISTINCT o.* FROM orders o
JOIN order_items oi ON oi.order_id = o.id
WHERE oi.product_id = 999;
-- 改写 3:DISTINCT → GROUP BY
-- 慢
SELECT DISTINCT user_id FROM orders WHERE created_at >= ...;
-- 排序去重
-- 快
SELECT user_id FROM orders WHERE created_at >= ... GROUP BY user_id;
-- hash 去重(PG 14+ 优化器一般会自动转,但显式写更稳)
-- 改写 4:LIMIT + OFFSET 大偏移
-- 慢:OFFSET 100000 要扫前 10w 行
SELECT * FROM orders ORDER BY id DESC LIMIT 20 OFFSET 100000;
-- 快:keyset pagination
SELECT * FROM orders WHERE id < 12345 ORDER BY id DESC LIMIT 20;
优化 5:VACUUM / ANALYZE
-- 检查表膨胀
SELECT
schemaname || '.' || tablename AS table,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
n_dead_tup,
n_live_tup,
round(n_dead_tup * 100.0 / NULLIF(n_live_tup, 0), 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;
-- 死元组超过 20% 必须 VACUUM
VACUUM ANALYZE orders;
-- 严重膨胀:VACUUM FULL(注意:会锁表)
-- 不要直接做,用 pg_repack 在线重建
$ pg_repack -d mydb -t orders -j 4
-- autovacuum 调优
-- postgresql.conf
autovacuum_vacuum_scale_factor = 0.05 -- 5% 触发(默认 20% 太高)
autovacuum_analyze_scale_factor = 0.02
autovacuum_max_workers = 6
autovacuum_naptime = 30s
maintenance_work_mem = 2GB -- VACUUM 用更多内存
-- 表级覆盖:大表调小阈值
ALTER TABLE orders SET (autovacuum_vacuum_scale_factor = 0.02);
ALTER TABLE orders SET (autovacuum_analyze_scale_factor = 0.01);
优化 6:参数调优
# postgresql.conf 关键参数(64GB 机器)
shared_buffers = 16GB # 总内存 25%
effective_cache_size = 48GB # 75%(给 query planner 估计)
maintenance_work_mem = 2GB # VACUUM / CREATE INDEX 用
work_mem = 64MB # 单 sort/hash 用(注意:并发会乘)
max_connections = 200
random_page_cost = 1.1 # SSD 设 1.1(机械盘是 4)
effective_io_concurrency = 200 # SSD 高并发 I/O
wal_buffers = 64MB
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
max_wal_size = 8GB
min_wal_size = 2GB
# 查询规划
default_statistics_target = 200 # 默认 100,大表设 500-1000
constraint_exclusion = partition # 分区表必开
# 并行查询
max_parallel_workers = 8
max_parallel_workers_per_gather = 4
parallel_setup_cost = 100
parallel_tuple_cost = 0.01
优化效果
指标 优化前 优化后 变化
=========================================================
报表查询 P50 8s 150ms -98%
报表查询 P99 30s 800ms -97%
慢查询总数(>1s)/天 12000 600 -95%
死元组比例 35% 3% -91%
表膨胀率 50% 5% -90%
索引命中率 60% 98% +63%
shared_buffer 命中 75% 96% +28%
读 IOPS 5000 800 -84%
CPU 85% 25% -71%
业务影响:
- 报表系统可用率从 80% 升到 99.9%
- 数据分析效率提升:运营自助分析变 SQL 实时可跑
- 节省机器:之前要 32C,现在 16C 够
监控告警
# Prometheus + postgres_exporter
- alert: PgSlowQuery
expr: rate(pg_stat_statements_total_exec_time_seconds[5m]) > 100
for: 5m
annotations:
summary: "PG 慢查询激增"
- alert: PgConnectionHigh
expr: pg_stat_activity_count > 150
for: 2m
- alert: PgReplicationLag
expr: pg_replication_lag_seconds > 30
for: 1m
- alert: PgTableBloat
expr: pg_table_bloat_ratio > 0.3
for: 1h
- alert: PgIndexBloat
expr: pg_index_bloat_ratio > 0.5
for: 1h
- alert: PgAutovacuumLag
expr: time() - pg_stat_user_tables_last_autovacuum_seconds > 86400 * 7
for: 1h
annotations:
summary: "表 1 周没 autovacuum,可能膨胀严重"
避坑清单
- postgresql.conf 必开 log_min_duration_statement 和 pg_stat_statements
- auto_explain 抓慢查询执行计划,EXPLAIN ANALYZE 复盘
- 大表必分区,partition pruning 是省钱关键
- 复合索引覆盖 WHERE + ORDER BY + INCLUDE(避免回表)
- 报表场景上物化视图,定时 CONCURRENTLY 刷新
- autovacuum 阈值调低(5%),大表表级覆盖
- VACUUM FULL 锁表,生产用 pg_repack 在线
- SSD 把 random_page_cost 调到 1.1
- shared_buffers ~25%,effective_cache_size ~75%
- 慢查询监控全打通,Grafana 看趋势告警
总结
PostgreSQL 慢查询优化是个系统工程:索引 + 分区 + 物化视图 + VACUUM + 参数调优,缺一不可。这次 2TB 数据集优化把 P99 从 30s 干到 800ms,核心思路是"测量 → 诊断 → 改写 → 验证"循环。最大的认知改变:PostgreSQL 的优化空间在执行计划里,会看 EXPLAIN ANALYZE 比记 100 个索引技巧更有用。pg_stat_statements + auto_explain 是排查神器,生产必装。最后,数据库优化没有银弹,2 亿行表不可能秒级出报表,合理的预期是"用对工具(物化视图 / OLAP)解决合适场景",PG 不是万能,有时候上 ClickHouse 才是答案。
—— 别看了 · 2026