This content originally appeared on DEV Community and was authored by gaurbprajapati
JVM Crash During TestNG Suite Execution – Root Cause & Fix
Running large-scale UI automation suites can be tricky — especially when TestNG and Maven Surefire are involved. Recently, we hit a JVM crash during execution that took down an entire test suite. After some deep investigation with heap dumps, GC logs, and TestNG internals, here’s the full Root Cause Analysis (RCA) and how we solved it.
The Error
[ERROR] org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test failed:
The forked VM terminated without properly saying goodbye.
VM crash or System.exit called?
At first glance, this looks like a random JVM crash. But the heap dump revealed ~6GB of memory retained by org.testng.SuiteRunner
, holding thousands of TestRunner
instances — each keeping entire test classes, WebDrivers, and PageObjects alive.
What is a Forked JVM?
Maven Surefire runs tests in a forked JVM — a separate Java process.
Why?
- Isolates tests from the main build
- Allows custom JVM args (
-Xmx
, GC options, heap dump on OOM, etc.) - Enables parallel test execution
Flow:
- Maven spawns a new JVM
- Tests run inside this forked process
- JVM args are applied via
<argLine>
inpom.xml
TestNG’s SuiteRunner Explained
org.testng.SuiteRunner
is the heart of TestNG suite execution.
Responsibilities:
- Parse
testng.xml
- Manage
<test>
blocks viaTestRunner
- Track all test classes & methods executed
- Aggregate results (pass/fail/skip)
- Feed data to reporters/listeners
Structure:
SuiteRunner
└── List<TestRunner>
└── Test Class Instance
├── WebDriver
├── Page Objects
├── Test Data
└── Utilities
Why Memory Leaks Happen
-
SuiteRunner → keeps strong refs to all
TestRunner
s -
TestRunner → holds
ITestContext
,ITestResult
, and test class instance - Test Class → holds WebDriver, Page Objects, Data Models
Until the suite ends, nothing is garbage-collected.
Result:
- 17,553
TestRunner
objects alive - Selenium WebDriver objects + DOM snapshots consume huge memory
- GC can’t reclaim → JVM crashes
RCA Summary
Factor | Detail |
---|---|
Error | Forked VM crash (Surefire goodbye error) |
Cause | JVM ran out of memory due to retained references in SuiteRunner
|
Trigger | Large number of tests in a single suite |
Leak Source | Strong references: SuiteRunner → TestRunner → Test Class |
GC Impact | Objects never eligible for GC until JVM exits |
Result | Heap bloat, OutOfMemoryError, JVM crash |
Fixes & Mitigation
1. Move Heavy Fields to Method Scope
Instead of keeping page objects at class level:
// ❌ Problematic
ExternalJobPage externalJobPage;
@BeforeMethod
public void setup() {
externalJobPage = new ExternalJobPage(getDriver());
}
Use method-level objects:
// ✅ GC-friendly
@Test
public void testSomething() {
ExternalJobPage page = new ExternalJobPage(getDriver());
page.verifyJobDetails();
}
2. Nullify References in Cleanup Hooks
@AfterMethod
public void clean() {
driver = null;
pageObject = null;
System.gc(); // Hint GC
}
3. Aggressive Field Nullification (Final Solution)
public void tearDown() {
try {
Field[] fields = this.getClass().getDeclaredFields();
for (Field field : fields) {
if (field.getName().startsWith("ajc$") || field.getType().isPrimitive()) {
continue;
}
field.setAccessible(true);
if (!Modifier.isStatic(field.getModifiers())) {
field.set(this, null);
}
}
log.info("Cleaned up instance for class " + this.getClass().getName());
} catch (Exception e) {
log.error("Failed to tear down: {}", e.getMessage());
}
System.gc();
}
And ensure cleanup of test data:
@AfterTest
public void clearTestData() {
try {
if (TestDataContext.globalTestDataMapSize() > 1) {
TestDataContext.clearData(testCasePath);
}
tearDown();
} catch (Exception e) {
log.error("Exception while clearing test data: {}", e.getMessage());
}
}
4. Split Large Suites
- Don’t run thousands of tests in one suite
- Break into smaller
testng.xml
files
5. Upgrade Tooling
- Use Maven Surefire 3.1.2+ (better fork handling)
- Use TestNG 7.x+ (memory fixes included)
6. Explore Dependency Injection (POC Needed)
Using DI (like Guice or Spring) ensures controlled lifecycles for test objects.
Key Takeaways
- SuiteRunner holds everything until suite ends — design your framework to release memory early.
- Avoid class-level heavy fields — use method scope.
-
Nullify aggressively in
@AfterMethod
/@AfterTest
. - Split test suites — don’t overload a single JVM.
- Upgrade Surefire + TestNG — newer versions manage memory better.
With these changes, our suite stopped crashing and memory usage dropped drastically.
If you’re running large-scale TestNG suites with Selenium, check your heap dump once in a while. You might be surprised how much
SuiteRunner
is holding on to.
This content originally appeared on DEV Community and was authored by gaurbprajapati