Page tree
Skip to end of metadata
Go to start of metadata


Purpose

The purpose of this document is to assist with enqueue server(ENSA) failover issues with ERS. To get enqueue replication table from ERS, there are some preconditions.

1- There is an active ERS in the system's landscape.
2- The new ASCS is started at the active ERS hosts.
3- The enqueue replication table is stored in the shared memory. This shared memory still exists when the new ASCS is started.

This page will show how to check whether the failover is successful or not, and how to check the preconditions above if the failover failed.

Please review the links in "Related Documents" part for the background knowledge if need. 

How to check whether the failover is successful or not

There are detailed steps about how to test the enqueue failover in the following URL.If you passes the tests in this URL, it means that the enqueue failover finishs successfully.
http://help.sap.com/saphelp_nw74/helpdata/en/47/e8ee429bd54231e10000000a421937/content.htm

Meanwhile, from the trace's point of view, if the failover finishes successfully, the following trace will be shown in dev_enqsrv of the new started ASCS instance. 
It also shows that after the restarted enqueue server get the locks, the shared memory segment used to save the replication table is deleted.

        # /usr/sap/<SID>/ASCS<inst-no>/work/dev_enqsrv 

---------------------------------------------------
trc file: "dev_enqsrv", trc level: 1, release: "742"
---------------------------------------------------
......

[Thr 139696690013984] Mon Mar 9 16:41:19 2015
[Thr 139696690013984] CGROUPS: changing prio of pid 20376 to high
[Thr 139696690013984] CGROUPS: disabled
[Thr 139696690013984] EnqInitCleanupServer: Shm of enqueue table (rc = 3) does not exist, nothing to clean up
[Thr 139696690013984] InitReplHooks: Tracing enqueue and replication statistics is switched off.
[Thr 139696690013984] initialize_global: Enqueue server started with replication functionality
[Thr 139696690013984] Enqueue: EnqMemStartupAction Utc=1425890479, attach_only=0
[Thr 139696690013984] EnqLockTableSizeCalculate: session quota = 100%
[Thr 139696690013984] EnqLockTableCreate: create lock table (size = 65536000)
[Thr 139696690013984] EnqLockTableMapToLocalContext: enque/use_pfclock2 = FALSE
[Thr 139696690013984] ShadowTable:attach: ShmCreate(,SHM_ATTACH,) -> 7f0d704cf000
[Thr 139696690013984] EnRepClass::getReplicaData: found old replication table with the following data:
[Thr 139696690013984] Line size:744, Line count: 56415, Failover Count: 0
[Thr 139696690013984] EnqId: 1425890266/16826, last stamp: 1/425890432/8000
[Thr 139696690013984] Byte order tags: int:1159934994 char:Z
[Thr 139696690013984] Enqueue checkpointing: start restoring entries. Utc=1425890479
[Thr 139696690013984] Delete replication table which was attached by the enqueue server
[Thr 139696690013984] ShadowTable:destroy: ShmCleanup( SHM_ENQ_REP_SHADOW_TBL)
[Thr 139696690013984] enque/backup_file disabled in enserver environment

 

[Thr 139696690013984] Mon Mar 9 16:41:20 2015
[Thr 139696690013984] ***LOG GEZ=> Server start [encllog.cpp 550]
[Thr 139696690013984] Enqueue server start with instance number 01

 

Otherwise if the new enqueue server does not get the locks from ERS, errors like following will be shown in dev_enqsrv of the new started ASCS instance. 

        # /usr/sap/<SID>/ASCS<inst-no>/work/dev_enqsrv
        [Thr 140694977873696] Listen successful on port/service sapdp01

        
        [Thr 140694977873696] Mon Mar  9 16:37:45 2015
        [Thr 140694977873696] CGROUPS: changing prio of pid 16826 to high
        [Thr 140694977873696] CGROUPS: disabled
        [Thr 140694977873696] EnqInitCleanupServer: Shm of enqueue table (rc = 3) does not exist, nothing to clean up
        [Thr 140694977873696] InitReplHooks: Tracing enqueue and replication statistics is switched off.
        [Thr 140694977873696] initialize_global: Enqueue server started with replication functionality
        
        [Thr 140694977873696] Mon Mar  9 16:37:46 2015
        [Thr 140694977873696] Enqueue: EnqMemStartupAction Utc=1425890266, attach_only=0
        [Thr 140694977873696] EnqLockTableSizeCalculate: session quota = 100%
        [Thr 140694977873696] EnqLockTableCreate: create lock table (size = 65536000)
        [Thr 140694977873696] EnqLockTableMapToLocalContext: enque/use_pfclock2 = FALSE
        [Thr 140694977873696] ShadowTable:attach: ShmCreate - pool doesn't exist
        [Thr 140694977873696] EnqRepRestoreFromReplica: failed to attach to old replication table: rc=-1
        [Thr 140694977873696] enque/backup_file disabled in enserver environment
        
        [Thr 140694977873696] Mon Mar  9 16:37:47 2015
        [Thr 140694977873696] ***LOG GEZ=> Server start [encllog.cpp  550]
        [Thr 140694977873696] Enqueue server start with instance number 01

 

One more remark, the trace "ShadowTable:attach: ShmCreate  ..." in dev_enqsrv is only shown when ERS is installed. If there is no ERS, the dev_enqsrv looks like following.

        # /usr/sap/<SID>/ASCS<inst-no>/work/dev_enqsrv

---------------------------------------------------
trc file: "dev_enqsrv", trc level: 1, release: "742"
---------------------------------------------------
sysno 01
sid QSB
systemid 390 (AMD/Intel x86_64 with Linux)
relno 7420
patchlevel 0
patchno 116
intno 20020600
make multithreaded, Unicode, 64 bit, debug
pid 17632

[Thr 140227397859104] Fri Aug 7 09:40:14 2015
[Thr 140227397859104] profile /usr/sap/QSB/SYS/profile/<SID>_ASCS<inst-no>_<your-hostname>
[Thr 140227397859104] hostname <your-hostname>
[Thr 140227397859104] Listen successful on port/service sapdp01

[Thr 140227397859104] Fri Aug 7 09:40:15 2015
[Thr 140227397859104] CGROUPS: changing prio of pid 17632 to high
[Thr 140227397859104] CGROUPS: disabled
[Thr 140227397859104] EnqInitCleanupServer: Shm of enqueue table (rc = 3) does not exist, nothing to clean up
[Thr 140227397859104] initialize_global: Enqueue server started WITHOUT replication functionality
[Thr 140227397859104] Enqueue: EnqMemStartupAction Utc=1438933215, attach_only=0
[Thr 140227397859104] EnqLockTableSizeCalculate: session quota = 100%
[Thr 140227397859104] EnqLockTableCreate: create lock table (size = 65536000)
[Thr 140227397859104] EnqLockTableMapToLocalContext: enque/use_pfclock2 = FALSE

[Thr 140227397859104] Fri Aug 7 09:40:16 2015
[Thr 140227397859104] ***LOG GEZ=> Server start [encllog.cpp 550]
[Thr 140227397859104] Enqueue server start with instance number 01

 

How to check the preconditions

Precondition 01: Check whether there is an active ERS in the system's landscape

Before the failover, you could see the following traces in dev_enqrepl in the old ASCS work folder.  These traces means that there is an active ERS who is doing the backup work.
Otherwise, the enqueue locks could not be retrieved during the failover.  

# /usr/sap/<SID>/ASCS<inst-no>/work/dev_enqrepl
---------------------------------------------------
trc file: "dev_enqrepl", trc level: 1, release: "742"
---------------------------------------------------
......
[Thr 139695737961888] Mon Mar  9 16:41:20 2015
[Thr 139695737961888] profile    /usr/sap/SM2/SYS/profile/<SID>_ASCS<inst-no>_<your-hostname>
[Thr 139695737961888] hostname   <your-hostname>
[Thr 139695737961888] IOListener::Listen: listen on port 50116 (addr 0.0.0.0)
[Thr 139695737961888] will sleep maximal 333 ms while waiting for response
[Thr 139695737961888] will sleep max 5000 ms when idle
[Thr 139695737961888] will wait maximal 1000 ms for input

[Thr 139695737961888] Mon Mar  9 16:41:40 2015
[Thr 139695737961888] A newly started replication server has connected
[Thr 139695737961888] ***LOG GEZ=> repl. activ [encllog.cpp  550]


Precondition 02: Check the active ERS Host

From the ERS's point of view, If ERS connects ASCS successfully, the following traces will be shown in dev_enrepsrv in ERS work folder.

# /usr/sap/<SID>/ERS<inst-no>/work/dev_enrepsrv
---------------------------------------------------
trc file: "dev_enrepsrv", trc level: 1, release: "742"
---------------------------------------------------
......
[Thr 140349019186976] Mon Mar  9 16:41:40 2015
[Thr 140349019186976] Replication server start with instance number 01
[Thr 140349019186976] Enqueue server on host <your-hostname>, IP-addr <your-ip-address>, port 50116
[Thr 140349019186976] ShadowTable:create: ShmCreate(,SHM_CREATE,len=43101208) -> 7fa57da05000
[Thr 140349019186976] Connected to Enqueue Server and created repl. table with 56415 lines
[Thr 140349019186976] EnStateTransferEnqToRep::process: transaction stamp (end_trans): 1 425890500 1000!
[Thr 140349019186976] EnStateTransferEnqToRep::process: Transaction has finished:4, process request and loop over fragments

 

You could have one or more ERS in you landscape. But there will be only one active ERS instance at one time.  If the ASCS is failed over to non-active ERS host, the locks could not be retrieved. The failover behavior is controled by your HA software. 

In the non-active ERS host, you will see the following traces in dev_enrepsrv in ERS instance.

# /usr/sap/<SID>/ERS<inst-no>/work/dev_enrepsrv
......
[Thr 140231835920160] Fri Jul 31 17:09:07 2015
[Thr 140231835920160] Replication server start with instance number 01
[Thr 140231835920160] Enqueue server on host CNPVGLLSSCSM2, IP-addr 10.58.133.115, port 50116
[Thr 140231835920160] ***LOG Q0I=> NiPConnect2: <ASCS-IP-Address>:50116: connect (111: Connection refused) [/source/bas/742_COR/src/base/ni/nixxi.cpp 3324]
[Thr 140231835920160] *** ERROR => NiPConnect2: SiPeekPendConn failed for hdl 1/sock 9
    (SI_ECONN_REFUSE/111; I4; ST; <ASCS-IP-Address>:50116) [nixxi.cpp    3324]
[Thr 140231835920160] *** ERROR => EncNiConnect: unable to connect (NIECONN_REFUSED). See SAP note 1943531 [encomi.c     442]
[Thr 140231835920160] *** ERROR => RepServer: main: no connection to Enqueue Server (rc=-7; ENC_ERR_REFUSED) => try again in 400 ms (see SAP note 1943531) [enrepserv.cp 776] 

 

If the non-active ERS change to active status, the following trace will be shown after NIECONN_REFUSED.

# /usr/sap/<SID>/ERS<inst-no>/work/dev_enrepsrv
......
[Thr 140231835920160] Fri Jul 31 17:09:33 2015
[Thr 140231835920160] ***LOG Q0I=> NiPConnect2: <ASCS-IP-Address>:50116: connect (111: Connection refused) [/source/bas/742_COR/src/base/ni/nixxi.cpp 3324]
[Thr 140231835920160] *** ERROR => NiPConnect2: SiPeekPendConn failed for hdl 7/sock 9
    (SI_ECONN_REFUSE/111; I4; ST; <ASCS-IP-Address>:50116) [nixxi.cpp    3324]
[Thr 140231835920160] *** ERROR => EncNiConnect: unable to connect (NIECONN_REFUSED). See SAP note 1943531 [encomi.c     442]
[Thr 140231835920160] *** ERROR => RepServer: main: no connection to Enqueue Server (rc=-7; ENC_ERR_REFUSED) => try again in 20000 ms (see SAP note 1943531) [enrepserv.cp 776]

[Thr 140231835920160] Fri Jul 31 17:09:53 2015
[Thr 140231835920160] ShadowTable:create: ShmCreate(,SHM_CREATE,len=43101208) -> 7f8a34f3f000
[Thr 140231835920160] Connected to Enqueue Server and created repl. table with 56415 lines
[Thr 140231835920160] EnStateTransferEnqToRep::process: transaction stamp (end_trans): 1 438333793 0!
[Thr 140231835920160] EnStateTransferEnqToRep::process: Transaction has finished:4, process request and loop over fragments

Precondition 03: Check whether enqueue replication table is shared memory exists or not

The replication table is saved in the shared memory. The shared memory key number is 66 of ASCS instance number.
Assume your ASCS instance number is 01, then running showipc <ASCS-inst-no> in the active ERS host, shared memory key 66 for enqueuer replication table will be shown as following.

>  showipc 01 | grep 66
OsKey:    10166 0x000027b6 Shared Memory Key: 66 Size:  43101208       41.1 MB  Att: 1   Owner: sm2adm  Perms: 740

 

The suitable instance number could also be checked by dev_enrepsrv in ERS work folder as following.

# /usr/sap/<SID>/ERS<inst-no>/work/dev_enrepsrv
---------------------------------------------------
trc file: "dev_enrepsrv", trc level: 1, release: "742"
---------------------------------------------------
......
[Thr 140349019186976] Mon Mar  9 16:41:40 2015
[Thr 140349019186976] Replication server start with instance number 01
[Thr 140349019186976] Enqueue server on host <your-hostname>, IP-addr <your-ip-address>, port 50116
[Thr 140349019186976] ShadowTable:create: ShmCreate(,SHM_CREATE,len=43101208) -> 7fa57da05000
[Thr 140349019186976] Connected to Enqueue Server and created repl. table with 56415 lines
[Thr 140349019186976] EnStateTransferEnqToRep::process: transaction stamp (end_trans): 1 425890500 1000!
[Thr 140349019186976] EnStateTransferEnqToRep::process: Transaction has finished:4, process request and loop over fragments
 

 

If the replication table is deleted, the locks could not be retrieved.

Below are some common issues that can damage the share memory before the ENSA connect to it. Please avoid them during the failover.

        1 - Executing cleanipc against ASCS instance number on active ERS host
             For example, during failover from host A to host B(The active ERS host), cleanipc <ASCS-inst-no> on host B will delete the replication table  in memory.
        2 - ERS was shut down too early on windows
              Check Replication and Failover@NW74 online help for more information.
        3 - ipcrm command at linux.

Some additional things 

01: Where to start the ERS?

It is common situation customers keep the ERS UP and running in the same host where ENSA is UP and running. It does not make sense to have the ERS UP and running in the same ENSA host. This does not cause problems, however the ERS will consume resource, therefore, the ERS in the host where ENSA is UP and running should keep down.

On the other hand, the host where ENSA is UP and running would not be able to work anymore for some errors. In such case the new ENSA could only be started on another host. This is also a good reason to start ERS to backup the enqueue locks in another host.

02: At the time the new ENSA starts and gets the locks from ERS, the ERS will be shut down or restarted.

Whether the ERS will be shut down or restarted, it depends on the setting in your ERS profile. Below are two examples.

# /usr/sap/<SID>/ERS<ERS-inst-no>/profile/<SID>_ERS<ERS-inst-no>_<ERS-hostname>

# The ERS will be restarted by the following setting.
Restart_Program_00 = local $(_ER) pf=$(_PFL) NR=$(SCSID)

# The ERS will be shut down by the following setting.
Start_Program_00 = local $(_ER) pf=$(_PFL) NR=$(SCSID)

03: How the ERS instance know the ASCS instance number?

The ASCS instance number is configured at the ERS profile. Below is an example of ERS profile which assumes that the ASCS instance number is 01.

# /usr/sap/<SID>/ERS<ERS-inst-no>/profile/<SID>_ERS<ERS-inst-no>_<ERS-hostname>

SCSID = 01
enque/serverinst = $(SCSID)
Restart_Program_00 = local $(_ER) pf=$(_PFL) NR=$(SCSID)

04: About OS Environment SAP_NI_CACHE_DISABLED

During a failover, a network change is also taken. If you find out that any SAP process caches the wrong network information, you could use OS environment SAP_NI_CACHE_DISABLED=1 to disable the cache. See the following note for more information.

SAP Note 1425520 - Disable NI cache for host and service names  

Related Documents

Please also review the following SAP Online Help for more information.

High Availability with the Standalone Enqueue Server

Replication and Failover

Monitoring the Lock Table at Failover

 

  • No labels