Where are things?
Stock Confluence is installed at /srv/wiki under its own user account 'wiki'.
To bounce the server
ssh lettuce.jenkins-ci.org
then
# Restart Confluence sudo docker restart confluence # Restart Confluence cache sudo docker restart confluence-cache
TODOs
- Apache runs auto export plugin to divert much of the load to statically generated HTML without bothering Confluence. Something we can look into. Example of static page and its corresponding Confluence page.
- Performance tuning guide from Atlassian
Incident Records
There's a ticket filed in Atlassian support for the recent outages. So let's keep records of when/how Confluence failed. (newer ones first)
See How to do a post-mortem analysis for what data to collect before relaunching a new instance
March 16th 18:49 PT
Upgraded JDK to 6u24 since investigation in CSP-58700 seems to indicate that there have been 7 JDK crashes while JIT-ing the exact same method. This KB article appears spot on.
March 16th afternoon PT
With the help of OSUSL, the VM now has 2.5GB heap. I've modified the VM parameters to "-Xmx768m -XX:MaxPermSize=256m". Previously it was 512m and 192m respectively.
March 16th (1st time)
JVM crash on out of memory error (full report):
# A fatal error has been detected by the Java Runtime Environment: # # java.lang.OutOfMemoryError: requested 536870928 bytes for Chunk::new. Out of swap space? # # Internal Error (allocation.cpp:215), pid=19777, tid=140363900471040 # Error: Chunk::new # # JRE version: 6.0_22-b04 # Java VM: Java HotSpot(TM) 64-Bit Server VM (17.1-b03 mixed mode linux-amd64 ) # An error report file with more information is saved as: # /srv/wiki/confluence-3.4.7-std/logs/hs_err_pid19777.log
It appears that the JVM crashed when it was trying to reallocate the oldgen from 300MB-ish to 500MB-ish because the kernel didn't have enough swap space to underwrite the new allocation.
March 16th (2nd time)
Unresponsive JVM. "jmap -heap" reported that all the heap spaces have fully filled up. Presumably the JVM went into the excessive GC mode although I couldn't confirm it.
Attaching to process ID 1423, please wait... Debugger attached successfully. Server compiler detected. JVM version is 17.1-b03 using thread-local object allocation. Parallel GC with 2 thread(s) Heap Configuration: MinHeapFreeRatio = 40 MaxHeapFreeRatio = 70 MaxHeapSize = 469762048 (448.0MB) NewSize = 1310720 (1.25MB) MaxNewSize = 17592186044415 MB OldSize = 5439488 (5.1875MB) NewRatio = 2 SurvivorRatio = 8 PermSize = 21757952 (20.75MB) MaxPermSize = 205520896 (196.0MB) Heap Usage: PS Young Generation Eden Space: capacity = 118554624 (113.0625MB) used = 118554624 (113.0625MB) free = 0 (0.0MB) 100.0% used From Space: capacity = 13762560 (13.125MB) used = 0 (0.0MB) free = 13762560 (13.125MB) 0.0% used To Space: capacity = 19005440 (18.125MB) used = 0 (0.0MB) free = 19005440 (18.125MB) 0.0% used PS Old Generation capacity = 313196544 (298.6875MB) used = 313196480 (298.68743896484375MB) free = 64 (6.103515625E-5MB) 99.99997956554718% used PS Perm Generation capacity = 132775936 (126.625MB) used = 132711288 (126.56334686279297MB) free = 64648 (0.06165313720703125MB) 99.95131045432811% used
March 16th (3rd time)
Andrew restarted it. No details.