Tuesday, March 29, 2011

When there is node eviction/CRS issue on 10g/11gR1. Where to go?

Posted by Naveen Kumar On Tuesday, March 29, 2011 No comments

1. Look at the cssd.log files on both nodes; usually we will get more information on the second node if the first node is evicted. Also take a look at crsd.log file too
2. The evicted node will have core dump file generated and system reboot info.
3. Find out if there was node reboot , is it because of CRS or others, check system reboot time
4. If you see “Polling” key words with reduce in percentage values in cssd.log file. It could be because node is busy so the cssd cannot get scheduled to Network.
5. If you see “Diskpingout” are something related to -DISK- then, the eviction is because of Disk time out.
6. After finding Network or Disk issue. Then starting going in depth.
7. Now it’s time to collect NMON/OSW/RDA reports to make sure /justify if it was DISK issue or Network.
8. If in case we see more memory contention/paging in the reports then it’s time to collect AWR report to see what loads/SQL was running during that period?
9. If network was the issue, then check if any NIC cards were down, or if link switching as happen. And check private interconnect is working between both the nodes.
10. Sometimes eviction could also be due to OS error where the system is in halt state for while or Memory over commitment or CPU 100% used.
11. Check OS /system logfiles to get more information.


still in progress...

0 comments :

Post a Comment