SYMPTOMS
The alert message looks like EM Event: Warning:TEST_PDB1 – Total global cache block lost is 15.
Host=db-test01.local Target type=Database Instance Target name=TEST_PDB1 Categories=Error Message=Total global cache block lost is 15. Severity=Warning Event reported time=Set 04, 2020 2:30:09 AM GMT Operating System=Linux Platform=x86_64 Associated Incident Id=348883 Associated Incident Status=New Associated Incident Owner= Associated Incident Acknowledged By Owner=No Associated Incident Priority=None Associated Incident Escalation Level=0 Event Type=Metric Alert Event name=rac_global_cache:lost Metric Group=Global Cache Statistics Metric=Global Cache Blocks Lost Metric value=15 Key Value= Rule Name=ROOT_NOTIFICATION_RULE,ALL TARGET EVENTS Rule Owner=SYSMAN Update Details: Total global cache block lost is 15.
DIAGNOSE
Your email box is full of the messages like the EM Incident: Warning:New: – Total global cache block lost is 15.
In the OMS repository database there is a high number of sent Global Cache Blocks Lost alerts for the target instance
SELECT TO_DATE(COLLECTION_TIMESTAMP, 'DD-MM-YYYY') "RECEIVED AT",
COUNT(COLLECTION_TIMESTAMP) "ALERTS"
FROM
MGMT_VIEW.MGMT$ALERT_NOTIF_LOG
WHERE
METRIC_NAME='rac_global_cache' AND
METRIC_COLUMN='lost' AND
COLUMN_LABEL = 'Global Cache Blocks Lost' AND
TARGET_NAME = '&INSTANCE_NAME'
GROUP BY
TO_DATE(COLLECTION_TIMESTAMP, 'DD-MM-YYYY')
ORDER BY 1;
Enter value for instance_name: TEST_PDB1
old 7: TARGET_NAME = '&INSTANCE_NAME'
new 7: TARGET_NAME = 'TEST_PDB1'
RECEIVED AT ALERTS
------------------ ----------
29-AUG-20 192
30-AUG-20 355
31-SET-20 355
01-SET-20 355
02-SET-20 355
03-SET-20 355
04-SET-20 325
05-SET-20 90
8 rows selected.
From the output it’s seen that 355 related alert messages generated every day for the database instance TEST_PDB1.
During the day there are not many lost blocks (or even zero lost block) for the database instance
SET PAGES 999
SET LINES 300
COL MESSAGE FOR A60
SELECT TO_CHAR(COLLECTION_TIMESTAMP, 'HH24:MI DD-MM-YYYY') ALERTED_AT,
MESSAGE
FROM
MGMT_VIEW.MGMT$ALERT_NOTIF_LOG
WHERE
TO_DATE(COLLECTION_TIMESTAMP, 'DD-MM-YYYY')=TO_DATE('&DDMMYYYY', 'DD-MM-YYYY') AND
TARGET_NAME = '&INSTANCE_NAME'
ORDER BY COLLECTION_TIMESTAMP;
Enter value for ddmmyyyy: 04-SET-20
old 7: TO_DATE(COLLECTION_TIMESTAMP, 'DD-MM-YYYY')=TO_DATE('&DDMMYYYY', 'DD-MM-YYYY') AND
new 7: TO_DATE(COLLECTION_TIMESTAMP, 'DD-MM-YYYY')=TO_DATE('04-SET-20', 'DD-MM-YYYY') AND
Enter value for instance_name: TEST_PDB1
old 8: TARGET_NAME = '&INSTANCE_NAME'
new 8: TARGET_NAME = 'TEST_PDB1'
ALERTED_AT MESSAGE
---------------- ------------------------------------------------------------
00:00 04-09-2020 Total global cache block lost is 15.
00:00 04-09-2020 Total global cache block lost is 15.
00:05 04-09-2020 Total global cache block lost is 15.
...
23:50 04-09-2020 Total global cache block lost is 15.
23:50 04-09-2020 Total global cache block lost is 15.
23:55 04-09-2020 Total global cache block lost is 15.
23:55 04-09-2020 Total global cache block lost is 15.
202 rows selected.
In this example I have zero lost block during the day (at 04-SET-20 between 00:00 and 23:55), however, I received alerts throughout the day.
SOLUTION
First I want to say that the note is not about how to troubleshoot and resolve lost block issue. For that purpose the 563566.1 and 2296681.1 must be followed. The note is about why OEM keeps sending the alerts even if there are no lost blocks for the last hours(days, weeks and so on).
Well,
The Global Cache Blocks Lost alert is based on the Global Cache Blocks Lost Metric in Enterprise Manager.
By default the metric has the following threshold values : 1 lost block for WARNING and 3 lost block for CRITICAL. To find a number of lost blocks the Metric uses the gc blocks lost statistic of the V$SYSSTAT (GV$SYSSTAT) view of the target instance.
SET PAGES 999
SET LINES 300
COL NAME FOR A20
COL VALUE FOR 999999
SELECT NAME, VALUE FROM V$SYSSTAT WHERE NAME='gc blocks lost';
NAME VALUE
-------------------- -------
gc blocks lost 15
NOTE: The V$SYSSTAT is based on GV$SYSSTAT
SET LINES 300 SET PAGES 999 COL VIEW_DEFINITION FOR A50 WORD_WRAPPED SELECT VIEW_DEFINITION FROM V$FIXED_VIEW_DEFINITION WHERE VIEW_NAME='V$SYSSTAT'; VIEW_DEFINITION -------------------------------------------------- select STATISTIC# , NAME , CLASS , VALUE, STAT_ID, CON_ID from GV$SYSSTAT where inst_id = USERENV('Instance')
The V$SYSSTAT view keeps value of lost blocks (statistics name gc blocks lost) since the instance startup. It means that the value of gc blocks lost statistic can be only increased. But once it increased it never reset until the next instance restart.
When a number of lost block (gc blocks lost) exceeded the threshold value (1 block for warning or 3 blocks for critical) of the Global Cache Blocks Lost Metric the OEM starts to send alerts. As the statistic value (gc blocks lost) will never be less than 1 or 3 until the next instance restart a number of lost blocks in this case will always be more than the value of thresholds.
That’s why the OEM keeps sending Global Cache Blocks Lost alerts every 5 minutes after the Threshold exceeded once.
Even if there is no lost block for a long period of time (hours, days, weeks) you will still receive the Global Cache Blocks Lost messages.
If you are sure the cluster instance has no problem with loosing blocks (gc blocks lost) then disable the Global Cache Blocks Lost Metric. It will stop spamming your email box. To do so just empty thresholds for the metric.
REFERENCES
Troubleshooting gc block lost and Poor Network Performance in a RAC Environment (Doc ID 563566.1)
WAITEVENT: “gc current/cr block lost” Reference Note (Doc ID 2296681.1)
Tuning Inter-Instance Performance in RAC and OPS (Doc ID 181489.1)
EM 13c: How to disable “Global Cache Blocks Lost Metric” Using EMCLI (Doc ID 2543134.1)
False increase of ‘Global Cache Blocks Lost’ or ‘gc blocks lost’ after upgrade to 12c (Doc ID 2096299.1)
NOTE: You can find ratio of lost blocks by the following query against the target instance
SET PAGES 999 SET LINES 300 COL RATIO FOR 99999999 SELECT A.INST_ID "INSTANCE", A.VALUE "GC BLOCKS LOST", B.VALUE "GC CUR BLOCKS SERVED", C.VALUE "GC CR BLOCKS SERVED", A.VALUE/(B.VALUE+C.VALUE) RATIO FROM GV$SYSSTAT A, GV$SYSSTAT B, GV$SYSSTAT C WHERE A.NAME='gc blocks lost' AND B.NAME='gc current blocks served' AND C.NAME='gc cr blocks served' and B.INST_ID=a.inst_id AND C.INST_ID = a.inst_id; INSTANCE GC BLOCKS LOST GC CUR BLOCKS SERVED GC CR BLOCKS SERVED RATIO ---------- -------------- -------------------- ------------------- --------- 1 15 32576274 42979 0
Database : 12.2.0.1.0, 19.8.0.0.0
OEM : 13.4.0.0.0