问题现象

##### 统一监控告警平台-DB: [触发]> `告警集群:` jq-prod-db-prometheus> `告警描述:` 数据库Oracle堵塞会话超过5分钟,会话ID:6055> `告警环境:` prod> `告警类型:` oracle_prod> `告警实例:` nbadb02> `告警IP:`   11.11.11.11> `实例别名:`  nbadb02> `当前数值:` 2> `告警级别:` 严重> `触发时间:` 2021-03-16 14:11:59> `持续时间:` 10s> `告警次数:` 1> `告警应用:`nba_oracle> `应用负责人:` 

处理过程

  • 登录11.11.11.11进入oracle,并设置列宽(如果使用plsql则不用设置)
[bzops@nbadb02 ~]$ sudo -i[root@nbadb02 ~]# su - oracle[oracle@nbadb02 ~]$ sqlplus / as sysdbaSQL*Plus: Release 11.2.0.4.0 Production on Tue Mar 2 15:05:36 2021Copyright (c) 1982, 2013, Oracle.  All rights reserved.Connected to:Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit ProductionWith the Partitioning, OLAP, Data Mining and Real Application Testing optionsSQL>SQL> col sample_time for a30col event for a30col program for a30col machine for a40set lines 200 pages 999
  • 查看当前的活动事务,发现会话60551573是14:07就已经开始了,到现在还没有结束,
    同时last_call_et是0,说明会话还在不停的执行sql中,并没有空闲
    其中6055占用的undo已经达到了11M,说明事务比较大
SQL> select a.inst_id inst,A.sid,a.serial#,a.blocking_session blk,a.last_call_et call,a.status,a.sql_id,a.PLSQL_ENTRY_OBJECT_ID plsql,PREV_SQL_ID,b.start_date,a.event,b.USED_UBLK*8/1024 m,a.seconds_in_wait wait,a.osuser,a.program,a.logon_time,b.xid,a.machine from gv$session a,gv$transaction b where a.taddr=b.addr and b.addr is not null and a.inst_id=b.inst_id order by 5 desc;    INST    SID SERIAL# BLK CALL    STATUS  SQL_ID  PLSQL   PREV_SQL_ID START_DATE  EVENT   M   WAIT    OSUSER  PROGRAM LOGON_TIME  XID MACHINE1   2   1545    7417        0   ACTIVE  crwj8kgsbsghz       crwj8kgsbsghz   2021/3/16 14:14:59  SQL*Net message to client   0.046875    0   nbauser JDBC Thin Client    2021/3/16 14:11:16  58001500481BE003    nba-service-daemon-58f676dcf6-bk8rp2   2   6059    34681       0   INACTIVE            2xuyhc81va9jk   2021/3/16 14:11:49  SQL*Net message from client 0.8984375   0   nbauser JDBC Thin Client    2021/3/16 14:03:07  F6000400896A0A01    nba-service-mq-6c654b54df-dzw623   2   6055    46555       0   INACTIVE            dxupr3kq3k911   2021/3/16 14:07:10  SQL*Net message from client 11.4609375  0   nbauser JDBC Thin Client    2021/3/16 14:06:26  3F001C004161E604    nba-service-web-64cd6dcc69-w45f24   2   1573    24181       0   INACTIVE            2uy2z5g8ky2ab   2021/3/16 14:07:01  SQL*Net message from client 1.578125    0   nbauser JDBC Thin Client    2021/3/16 13:07:09  D3000D00D01C6801    nba-service-mq-6c654b54df-pqgp6
  • 查看会话历史,可以看到很多会话都被6055阻塞了,印证了上面查的6055的事务比较大
SQL> select sample_time,session_id as sid,sql_id,event,plsql_entry_object_id as plsql,sql_plan_hash_value as plan,sql_plan_line_id as line,sql_exec_id as execid,blocking_session blk,program as program,xid,user_id ,machine from gv$active_session_history where blocking_session is not null ;    SAMPLE_TIME SID SQL_ID  EVENT   PLSQL   PLAN    LINE    EXECID  BLK PROGRAM XID USER_ID MACHINE1   16-MAR-21 02.14.26.440 PM   1731    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49065086    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-w45f22   16-MAR-21 02.14.26.440 PM   3742    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064614    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-dtkz63   16-MAR-21 02.14.26.440 PM   1773    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064803    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-dtkz64   16-MAR-21 02.14.26.440 PM   6108    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064714    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-64v4t5   16-MAR-21 02.14.26.440 PM   7261    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49065133    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-5h6zf6   16-MAR-21 02.14.26.440 PM   9541    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064892    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-wcctn7   16-MAR-21 02.14.26.440 PM   10690   08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064975    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-64v4t8   16-MAR-21 02.14.26.440 PM   11724   08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49065049    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-76hs89   16-MAR-21 02.14.25.440 PM   1773    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064803    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-dtkz610  16-MAR-21 02.14.25.440 PM   1731    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49065086    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-w45f211  16-MAR-21 02.14.25.440 PM   3742    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064614    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-dtkz612  16-MAR-21 02.14.25.440 PM   6108    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064714    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-64v4t13  16-MAR-21 02.14.25.440 PM   7261    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49065133    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-5h6zf14  16-MAR-21 02.14.25.440 PM   9541    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064892    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-wcctn15  16-MAR-21 02.14.25.440 PM   10690   08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064975    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-64v4t16  16-MAR-21 02.14.25.440 PM   11724   08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49065049    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-76hs817  16-MAR-21 02.14.24.430 PM   9541    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064892    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-wcctn18  16-MAR-21 02.14.24.430 PM   11724   08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49065049    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-76hs819  16-MAR-21 02.14.24.430 PM   10690   08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064975    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-64v4t20  16-MAR-21 02.14.24.430 PM   7261    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49065133    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-5h6zf21  16-MAR-21 02.14.24.430 PM   6108    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064714    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-64v4t22  16-MAR-21 02.14.24.430 PM   3742    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064614    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-dtkz623  16-MAR-21 02.14.24.430 PM   1773    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064803    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-dtkz624  16-MAR-21 02.14.24.430 PM   1731    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49065086    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-w45f225  16-MAR-21 02.14.23.430 PM   10690   08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064975    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-64v4t26  16-MAR-21 02.14.23.430 PM   1731    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49065086    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-w45f227  16-MAR-21 02.14.23.430 PM   11724   08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49065049    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-76hs828  16-MAR-21 02.14.23.430 PM   1773    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064803    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-dtkz629  16-MAR-21 02.14.23.430 PM   3742    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064614    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-dtkz630  16-MAR-21 02.14.23.430 PM   6108    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064714    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-64v4t31  16-MAR-21 02.14.23.430 PM   7261    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49065133    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-5h6zf32  16-MAR-21 02.14.23.430 PM   9541    08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064892    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-wcctn33  16-MAR-21 02.14.22.400 PM   10690   08qkr18cnk9rk   enq: TX - row lock contention       585082728   1   49064975    6055    JDBC Thin Client        101 nba-service-web-64cd6dcc69-64v4t................
  • 既然知道了阻塞者是6055,同时也不仅6055一个会话到现在没结束,会话也不是空闲中,
    接下来只能找出6055正在跑的sql发给开发评估这个事务是不是正常
    oracle中可以通过sql_trace来记录会话一段时间内都跑的什么sql
    (如果sql执行时间大于一秒,其实通过活动会话历史v$active_session_history也可以定位,但是6055last_call_et是0,说明执行的sql都是小于一秒的)
SQL> exec dbms_system.set_sql_trace_in_session(6055,46555,true);PL/SQL procedure successfully completed.间隔1分钟SQL>  exec dbms_system.set_sql_trace_in_session(6055,46555,false);PL/SQL procedure successfully completed.
  • 收集好sql_trace后,进入oracle的trace目录,按照时间排序可以看到最新的ora_开头的日志
-rw-r----- 1 oracle asmadmin  74K Mar 16 14:15 nba2_ora_149894.trm-rw-r----- 1 oracle asmadmin 2.3M Mar 16 14:15 nba2_ora_149894.trc
  • 通过tkprof工具格式化一下
[oracle@nbadb02 trace]$ tkprof nba2_ora_149894.trcoutput = bk.logTKPROF: Release 11.2.0.4.0 - Development on Tue Mar 16 14:16:08 2021Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.[oracle@nbadb02 trace]$ sz bk.log
  • 过滤掉bk.log中的sys用户的递归sql,可以看到6055执行的sql逻辑
SQL ID: dzns8cts4u1a4 Plan Hash: 383260619update tab1 set col1=:1 , col2=:2 , col3=  :3 , col4=:4 , col5=:5 , col6=:6 , col7=:7 , col8=:8 ,   col9=:9 , col10=:10 , col11=:11 , col12=:12 , col13=:13  where ID=:14  and VERSION=:15 call     count       cpu    elapsed       disk      query    current        rows------- ------  -------- ---------- ---------- ---------- ----------  ----------Parse     1878      0.03       0.05          0          0          0           0Execute   1878      0.28       0.46        202       9475      17465        1878Fetch        0      0.00       0.00          0          0          0           0------- ------  -------- ---------- ---------- ---------- ----------  ----------total     3756      0.31       0.51        202       9475      17465        1878....SQL ID: dxupr3kq3k911 Plan Hash: 743517995select sysinterfa0_.ID as ID400_, sysinterfa0_.BRAND_ID as BRAND2_400_,   sysinterfa0_.CONTENT as CONTENT400_, sysinterfa0_.CREATE_TIME as   CREATE4_400_, sysinterfa0_.ERROR_COUNT as ERROR5_400_, sysinterfa0_.OP_TIME   as OP6_400_, sysinterfa0_.REMARK as REMARK400_, sysinterfa0_.SHOP_ID as   SHOP8_400_, sysinterfa0_.SLIP_CODE as SLIP9_400_, sysinterfa0_.SOURCE_SYS   as SOURCE10_400_, sysinterfa0_.STATUS as STATUS400_,   sysinterfa0_.TARGET_SYS as TARGET12_400_, sysinterfa0_.TYPE as TYPE400_,   sysinterfa0_.VERSION as VERSION400_ from tab2 sysinterfa0_ where sysinterfa0_.SLIP_CODE=:1  and   sysinterfa0_.TYPE=:2  and sysinterfa0_.SOURCE_SYS=:3  and   sysinterfa0_.TARGET_SYS=:4  and sysinterfa0_.BRAND_ID=:5 call     count       cpu    elapsed       disk      query    current        rows------- ------  -------- ---------- ---------- ---------- ----------  ----------Parse     1913      0.03       0.04          0          0          0           0Execute   1913      0.02       0.03          0          0          0           0Fetch     1913      0.04       0.09         55       9854          0        1878------- ------  -------- ---------- ---------- ---------- ----------  ----------total     5739      0.10       0.17         55       9854          0        1878....                        select shop.id                            from tab1 sku,                                 tab2 pro,                                 tab3 br,                                 tab4 shop                            where sku.product_id = pro.id                                 and  pro.brand_id =  br.brand_id                                 and  br.shop_id =  shop.id                                 and  sku.code = :1                                  and  shop.IS_SYNC_BMS = 1call     count       cpu    elapsed       disk      query    current        rows------- ------  -------- ---------- ---------- ---------- ----------  ----------Parse      382      0.00       0.00          0          0          0           0Execute    382      0.00       0.00          0          0          0           0Fetch      382      0.11       0.14          1      15280          0           0------- ------  -------- ---------- ---------- ---------- ----------  ----------total     1146      0.12       0.16          1      15280          0           0.....

解决办法

  • 把以上sql发给开发分析后,反馈这个事务逻辑有优化空间,现在在做批量库存的操作,有阻塞可以忽略,未来会优化逻辑
©著作权归作者所有:来自51CTO博客作者loganjin的原创作品,如需转载,请注明出处,否则将追究法律责任

更多相关文章

  1. 企业微信接收 Amazon CloudWatch 告警
  2. Prometheus+Alertmanager配置邮件报警
  3. Zabbix监控ELK异常日志告警
  4. java使用 Cookie 和 JWT 身份验证
  5. Nginx的Upstream监控及告警
  6. 初试kafka监控及管理利器之kafka-eagle
  7. Prometheus配置企业微信告警
  8. 介绍一款好用的终端工具 Screen
  9. 第14章 0301-实战会话控制技术,(项目实战:主页登录、用户注册 + 整

随机推荐

  1. Android中丰富多彩的onTouch事件
  2. [Android]解决控件重叠、覆盖的问题
  3. Android图像处理
  4. android布局ui
  5. Android:EditText 常用属性
  6. ubuntu下无法更新android sdk
  7. 四、 Android之手机屏幕朝向
  8. Android 常用编程技巧
  9. 相对布局RelativeLayout
  10. 扫描二维码自动识别手机系统(Android/IOS)