We have a db2 V9.7 with daily tsm online backups.
Last few days tsm backup fails with rc -50 :
-050 E: DSM_RC_TCPIP_FAILURE Session rejected: TCP/IP connection failure
and in db2diag.log i see Vendor error: rc = 17 returned from function sqluvend. which means that commit function failed
Backup is starting at 3:00am. Normally it was successfully finished in 3 hours, but last days it fails after about 9.5 hours - and each day practically at the same timepoint.
"db2 list utilities show detail" is telling that 90% backup progress reached and after that nothing more happens.
Regular archive log backups are being completed successfully - same tsm server used. So i cant imagine how this could be network problem.
I tried to increase CommTimeOut from 20 000 to 50 000 but no result. Logically, 20 000 is more than enough as succesfull backups where done within 3 hours.
I would be very thankful for any suggestions.
Regards,
Zura
Here is the db2diag.log:
2014-07-18-13.44.54.216500+240 E547308110A348 LEVEL: Error
PID : 3891368 TID : 1 PROC : db2vend
INSTANCE: db2inst1 NODE : 000
EDUID : 1
FUNCTION: DB2 UDB, database utilities, sqluvend, probe:1536
DATA #1 : TSM RC, PD_DB2_TYPE_TSM_RC, 4 bytes
TSM RC=0xFFFFFFCE=-50 -- see TSM API Reference for meaning.
2014-07-18-13.44.54.242924+240 I547308459A747 LEVEL: Error
PID : 3100900 TID : 26499 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000
EDUID : 26499 EDUNAME: db2med.27760.0 (CDRS) 0
FUNCTION: DB2 UDB, database utilities, sqluMapVend2MediaRCWithLog, probe:655
DATA #1 : String, 135 bytes
Vendor error: rc = 17 returned from function sqluvend.
Return_code structure from vendor library /home/db2inst1/sqllib/adsm/libtsm.a:
DATA #2 : Hexdump, 48 bytes
0x07000001DD16CA10 : FFFF FFCE 3135 3336 202D 3530 0000 0000 ....1536 -50....
0x07000001DD16CA20 : 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x07000001DD16CA30 : 0000 0000 0000 0000 0000 0000 0000 0000 ................
2014-07-18-13.44.54.243295+240 I547309207A374 LEVEL: Error
PID : 3100900 TID : 26499 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000
EDUID : 26499 EDUNAME: db2med.27760.0 (CDRS) 0
FUNCTION: DB2 UDB, database utilities, sqluMapVend2MediaRCWithLog, probe:695
MESSAGE : Error in vendor support code at line: 1536 rc: -50
2014-07-18-13.44.54.243565+240 E547309582A351 LEVEL: Error
PID : 3100900 TID : 26499 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000
EDUID : 26499 EDUNAME: db2med.27760.0 (CDRS) 0
FUNCTION: DB2 UDB, database utilities, sqluMCCloseSequence, probe:1007
MESSAGE : Media controller -- Generic error
2014-07-18-13.44.54.243792+240 E547309934A443 LEVEL: Error
PID : 3100900 TID : 26499 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000
EDUID : 26499 EDUNAME: db2med.27760.0 (CDRS) 0
FUNCTION: DB2 UDB, database utilities, sqluMCCloseSequence, probe:1007
MESSAGE : SQL2033N An error occurred while accessing TSM during the processing
of a database utility. TSM reason code: "".
2014-07-18-13.44.54.244939+240 E547310378A337 LEVEL: Error
PID : 3100900 TID : 26499 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000
EDUID : 26499 EDUNAME: db2med.27760.0 (CDRS) 0
FUNCTION: DB2 UDB, database utilities, sqluMCCloseSequence, probe:1007
DATA #1 : String, 3 bytes
-50
2014-07-18-13.44.54.245406+240 E547310716A558 LEVEL: Severe
PID : 3100900 TID : 27760 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : CDRS
APPHDL : 0-12920 APPID: *LOCAL.db2inst1.140717230445
AUTHID : DB2INST1
EDUID : 27760 EDUNAME: db2agent (CDRS) 0
FUNCTION: DB2 UDB, database utilities, sqlubHandleTSMError, probe:1179
DATA #1 : Sqlcode, PD_TYPE_SQLCODE, 4 bytes
-2033
DATA #2 : Hexdump, 8 bytes
0x0700000386AB4828 : FFFF F80F 2D35 3000 ....-50.
2014-07-18-13.44.54.245778+240 E547311275A966 LEVEL: Severe
PID : 3100900 TID : 27760 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : CDRS
APPHDL : 0-12920 APPID: *LOCAL.db2inst1.140717230445
AUTHID : DB2INST1
EDUID : 27760 EDUNAME: db2agent (CDRS) 0
FUNCTION: DB2 UDB, database utilities, sqlubHandleTSMError, probe:1179
MESSAGE : SQL2033N An error occurred while accessing TSM during the processing
of a database utility. TSM reason code: "".
DATA #1 : SQLCA, PD_DB2_TYPE_SQLCA, 136 bytes
sqlcaid : SQLCA sqlcabc: 136 sqlcode: -2033 sqlerrml: 3
sqlerrmc: -50
sqlerrp : sqlubHan
sqlerrd : (1) 0x00000000 (2) 0x00000000 (3) 0x00000000
(4) 0x00000000 (5) 0x00000000 (6) 0x00000000
sqlwarn : (1) (2) (3) (4) (5) (6)
(7) (8) (9) (10) (11)
sqlstate:
2014-07-18-13.45.00.392866+240 E547312242A425 LEVEL: Severe
PID : 3100900 TID : 27760 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : CDRS
APPHDL : 0-12920 APPID: *LOCAL.db2inst1.140717230445
AUTHID : DB2INST1
EDUID : 27760 EDUNAME: db2agent (CDRS) 0
FUNCTION: DB2 UDB, database utilities, sqlubcka, probe:843
MESSAGE : Backup terminated.
2014-07-18-13.45.00.472903+240 I547312668A822 LEVEL: Error
PID : 2609256 TID : 259 PROC : db2dasstm
INSTANCE: db2inst1 NODE : 000
EDUID : 259
FUNCTION: DB2 Tools, DB2 administration server, handleRunCmd, probe:85
DATA #1 : signed integer, 4 bytes
4
CALLSTCK:
[0] 0x0900000009718314 pdOSSeLoggingCallback + 0x34
[1] 0x0900000000BB2424 oss_log__FP9OSSLogFacUiN32UlN26iPPc + 0x1C4
[2] 0x0900000000BB2210 ossLog + 0xD0
[3] 0x0000000100003AA4 handleRunCmd__FP17stmRequestContext + 0xA44
[4] 0x00000001000074BC db2dasStmRequestDispatchThread + 0x31C
[5] 0x090000000AFD67D4 db2dasThreadMain + 0x114
[6] 0x0900000000726D50 _pthread_body + 0xF0
[7] 0xFFFFFFFFFFFFFFFC ?unknown + 0xFFFFFFFF
[8] 0x0000000000000000 ?unknown + 0x0
[9] 0x0000000000000000 ?unknown + 0x0
and this is tsm server actlog:
07/18/14 13:41:52 ANR0514I Session 1155 closed volume MIA162L4. (SESSION:
1155)
07/18/14 13:41:52 ANR0481W Session 1155 for node DB2 (DB2/AIX64) terminated
- client did not respond within 20000 seconds. (SESSION:
1155)
07/18/14 13:41:57 ANR1341I Scratch volume MIA153L4 has been deleted from
storage pool TAPE_POOL. (SESSION: 1155)
07/18/14 13:41:58 ANR8336I Verifying label of LTO volume MIA162L4 in drive
DRIVE1 (/dev/rmt2). (SESSION: 1155)
07/18/14 13:42:54 ANR8468I LTO volume MIA162L4 dismounted from drive DRIVE1
(/dev/rmt2) in library TS3500. (SESSION: 1155)
Last few days tsm backup fails with rc -50 :
-050 E: DSM_RC_TCPIP_FAILURE Session rejected: TCP/IP connection failure
and in db2diag.log i see Vendor error: rc = 17 returned from function sqluvend. which means that commit function failed
Backup is starting at 3:00am. Normally it was successfully finished in 3 hours, but last days it fails after about 9.5 hours - and each day practically at the same timepoint.
"db2 list utilities show detail" is telling that 90% backup progress reached and after that nothing more happens.
Regular archive log backups are being completed successfully - same tsm server used. So i cant imagine how this could be network problem.
I tried to increase CommTimeOut from 20 000 to 50 000 but no result. Logically, 20 000 is more than enough as succesfull backups where done within 3 hours.
I would be very thankful for any suggestions.
Regards,
Zura
Here is the db2diag.log:
2014-07-18-13.44.54.216500+240 E547308110A348 LEVEL: Error
PID : 3891368 TID : 1 PROC : db2vend
INSTANCE: db2inst1 NODE : 000
EDUID : 1
FUNCTION: DB2 UDB, database utilities, sqluvend, probe:1536
DATA #1 : TSM RC, PD_DB2_TYPE_TSM_RC, 4 bytes
TSM RC=0xFFFFFFCE=-50 -- see TSM API Reference for meaning.
2014-07-18-13.44.54.242924+240 I547308459A747 LEVEL: Error
PID : 3100900 TID : 26499 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000
EDUID : 26499 EDUNAME: db2med.27760.0 (CDRS) 0
FUNCTION: DB2 UDB, database utilities, sqluMapVend2MediaRCWithLog, probe:655
DATA #1 : String, 135 bytes
Vendor error: rc = 17 returned from function sqluvend.
Return_code structure from vendor library /home/db2inst1/sqllib/adsm/libtsm.a:
DATA #2 : Hexdump, 48 bytes
0x07000001DD16CA10 : FFFF FFCE 3135 3336 202D 3530 0000 0000 ....1536 -50....
0x07000001DD16CA20 : 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x07000001DD16CA30 : 0000 0000 0000 0000 0000 0000 0000 0000 ................
2014-07-18-13.44.54.243295+240 I547309207A374 LEVEL: Error
PID : 3100900 TID : 26499 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000
EDUID : 26499 EDUNAME: db2med.27760.0 (CDRS) 0
FUNCTION: DB2 UDB, database utilities, sqluMapVend2MediaRCWithLog, probe:695
MESSAGE : Error in vendor support code at line: 1536 rc: -50
2014-07-18-13.44.54.243565+240 E547309582A351 LEVEL: Error
PID : 3100900 TID : 26499 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000
EDUID : 26499 EDUNAME: db2med.27760.0 (CDRS) 0
FUNCTION: DB2 UDB, database utilities, sqluMCCloseSequence, probe:1007
MESSAGE : Media controller -- Generic error
2014-07-18-13.44.54.243792+240 E547309934A443 LEVEL: Error
PID : 3100900 TID : 26499 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000
EDUID : 26499 EDUNAME: db2med.27760.0 (CDRS) 0
FUNCTION: DB2 UDB, database utilities, sqluMCCloseSequence, probe:1007
MESSAGE : SQL2033N An error occurred while accessing TSM during the processing
of a database utility. TSM reason code: "".
2014-07-18-13.44.54.244939+240 E547310378A337 LEVEL: Error
PID : 3100900 TID : 26499 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000
EDUID : 26499 EDUNAME: db2med.27760.0 (CDRS) 0
FUNCTION: DB2 UDB, database utilities, sqluMCCloseSequence, probe:1007
DATA #1 : String, 3 bytes
-50
2014-07-18-13.44.54.245406+240 E547310716A558 LEVEL: Severe
PID : 3100900 TID : 27760 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : CDRS
APPHDL : 0-12920 APPID: *LOCAL.db2inst1.140717230445
AUTHID : DB2INST1
EDUID : 27760 EDUNAME: db2agent (CDRS) 0
FUNCTION: DB2 UDB, database utilities, sqlubHandleTSMError, probe:1179
DATA #1 : Sqlcode, PD_TYPE_SQLCODE, 4 bytes
-2033
DATA #2 : Hexdump, 8 bytes
0x0700000386AB4828 : FFFF F80F 2D35 3000 ....-50.
2014-07-18-13.44.54.245778+240 E547311275A966 LEVEL: Severe
PID : 3100900 TID : 27760 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : CDRS
APPHDL : 0-12920 APPID: *LOCAL.db2inst1.140717230445
AUTHID : DB2INST1
EDUID : 27760 EDUNAME: db2agent (CDRS) 0
FUNCTION: DB2 UDB, database utilities, sqlubHandleTSMError, probe:1179
MESSAGE : SQL2033N An error occurred while accessing TSM during the processing
of a database utility. TSM reason code: "".
DATA #1 : SQLCA, PD_DB2_TYPE_SQLCA, 136 bytes
sqlcaid : SQLCA sqlcabc: 136 sqlcode: -2033 sqlerrml: 3
sqlerrmc: -50
sqlerrp : sqlubHan
sqlerrd : (1) 0x00000000 (2) 0x00000000 (3) 0x00000000
(4) 0x00000000 (5) 0x00000000 (6) 0x00000000
sqlwarn : (1) (2) (3) (4) (5) (6)
(7) (8) (9) (10) (11)
sqlstate:
2014-07-18-13.45.00.392866+240 E547312242A425 LEVEL: Severe
PID : 3100900 TID : 27760 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : CDRS
APPHDL : 0-12920 APPID: *LOCAL.db2inst1.140717230445
AUTHID : DB2INST1
EDUID : 27760 EDUNAME: db2agent (CDRS) 0
FUNCTION: DB2 UDB, database utilities, sqlubcka, probe:843
MESSAGE : Backup terminated.
2014-07-18-13.45.00.472903+240 I547312668A822 LEVEL: Error
PID : 2609256 TID : 259 PROC : db2dasstm
INSTANCE: db2inst1 NODE : 000
EDUID : 259
FUNCTION: DB2 Tools, DB2 administration server, handleRunCmd, probe:85
DATA #1 : signed integer, 4 bytes
4
CALLSTCK:
[0] 0x0900000009718314 pdOSSeLoggingCallback + 0x34
[1] 0x0900000000BB2424 oss_log__FP9OSSLogFacUiN32UlN26iPPc + 0x1C4
[2] 0x0900000000BB2210 ossLog + 0xD0
[3] 0x0000000100003AA4 handleRunCmd__FP17stmRequestContext + 0xA44
[4] 0x00000001000074BC db2dasStmRequestDispatchThread + 0x31C
[5] 0x090000000AFD67D4 db2dasThreadMain + 0x114
[6] 0x0900000000726D50 _pthread_body + 0xF0
[7] 0xFFFFFFFFFFFFFFFC ?unknown + 0xFFFFFFFF
[8] 0x0000000000000000 ?unknown + 0x0
[9] 0x0000000000000000 ?unknown + 0x0
and this is tsm server actlog:
07/18/14 13:41:52 ANR0514I Session 1155 closed volume MIA162L4. (SESSION:
1155)
07/18/14 13:41:52 ANR0481W Session 1155 for node DB2 (DB2/AIX64) terminated
- client did not respond within 20000 seconds. (SESSION:
1155)
07/18/14 13:41:57 ANR1341I Scratch volume MIA153L4 has been deleted from
storage pool TAPE_POOL. (SESSION: 1155)
07/18/14 13:41:58 ANR8336I Verifying label of LTO volume MIA162L4 in drive
DRIVE1 (/dev/rmt2). (SESSION: 1155)
07/18/14 13:42:54 ANR8468I LTO volume MIA162L4 dismounted from drive DRIVE1
(/dev/rmt2) in library TS3500. (SESSION: 1155)