25:00
Focus
Lesson 5
~14 min100 XP

JVM Internals and Garbage Collection

Introduction

Have you ever wondered what actually happens between writing System.out.println("Hello") and seeing text appear on your screen? Beneath the surface of every Java application lies a sophisticated virtual machine that loads your classes, manages memory, optimizes code at runtime, and cleans up after you — all without you writing a single line of memory management code. In this lesson, you'll pull back the curtain on the JVM's architecture, understand how garbage collection really works, and learn how to tune these systems for optimal performance.

The JVM Architecture: A Bird's-Eye View

The Java Virtual Machine (JVM) is not a single monolithic program — it's a carefully orchestrated collection of subsystems, each with a specific responsibility. Understanding these subsystems is the foundation for everything else in this lesson.

The Three Major Subsystems

  1. Class Loader Subsystem — Responsible for finding, loading, and verifying .class files. It follows a delegation model: the Bootstrap ClassLoader loads core Java classes (java.lang.*), the Extension ClassLoader (or Platform ClassLoader in Java 9+) loads extension libraries, and the Application ClassLoader loads your application's classes. This hierarchy exists for security — it prevents malicious code from replacing core Java classes.

  2. Runtime Data Areas — These are the memory regions the JVM allocates when it starts. The most important ones are:

    • Method Area (Metaspace): Stores class metadata, method bytecode, and the constant pool. Since Java 8, this lives in native memory, not the heap.
    • Heap: Where all object instances live. This is the region managed by garbage collection.
    • Java Stacks: Each thread gets its own stack, containing stack frames — one per method invocation. Each frame holds local variables, an operand stack, and a reference to the constant pool.
    • PC Registers: Each thread has a program counter tracking the current bytecode instruction.
    • Native Method Stacks: Used for native (non-Java) method calls.
  3. Execution Engine — Converts bytecode into machine instructions. It includes the interpreter (executes bytecode line by line), the JIT (Just-In-Time) compiler (compiles hot bytecode paths into native machine code for performance), and the garbage collector.

The heap is shared across all threads, while stacks are thread-private. This distinction is critical for understanding both memory layout and thread safety.

A common misconception is that Java is "slow because it's interpreted." In reality, the JIT compiler identifies hot spots — frequently executed code paths — and compiles them to highly optimized native code, often rivaling C++ performance.

Exercise 1Multiple Choice
Where is class metadata (method bytecode, constant pool) stored in a modern JVM (Java 8+)?

The Heap in Depth: Generational Memory Model

The heap isn't just one big pool of memory — it's carefully divided into regions based on a key empirical observation called the weak generational hypothesis: most objects die young. Think of it like a hospital triage system — new patients (objects) go to a fast-turnover area first, and only those that survive long enough get moved to longer-term care.

Heap Generations

The traditional heap layout (used by the default garbage collectors) divides memory into:

  • Young Generation: Where new objects are allocated. It's further divided into:

    • Eden Space: The birthplace of nearly all objects. When Eden fills up, a Minor GC is triggered.
    • Survivor Spaces (S0 and S1): Two equally sized spaces. After a Minor GC, surviving objects from Eden are copied to one Survivor space. Objects that survive multiple GC cycles in Survivor spaces are eventually promoted (or tenured) to the Old Generation.
  • Old Generation (Tenured): Home for long-lived objects. GC here is called Major GC (or sometimes Full GC) and is significantly more expensive — it often causes longer application pauses.

How Minor GC Works (Stop-and-Copy)

  1. New objects fill up Eden.
  2. GC is triggered. All threads are paused (a stop-the-world event, though very brief for Minor GC).
  3. The collector traces references from GC roots (stack variables, static fields, JNI references) to find all reachable objects in the Young Generation.
  4. Reachable objects in Eden and the current active Survivor space are copied to the other Survivor space.
  5. Everything left behind is considered garbage — Eden and the old Survivor space are wiped clean in one shot.
  6. Objects that have survived enough cycles (controlled by -XX:MaxTenuringThreshold, default 15) are promoted to Old Generation.

This is extremely efficient because most objects in Eden are already dead by the time GC runs — so the collector only copies a small fraction of objects.

Exercise 2True or False
In the JVM's generational heap model, objects are initially allocated in the Old Generation and moved to the Young Generation if they are short-lived.

Garbage Collection Algorithms and Collectors

Not all garbage collectors work the same way. The JVM offers several collector implementations, each making different trade-offs between throughput (how much CPU time goes to your application vs. GC), latency (how long GC pauses last), and footprint (how much extra memory the collector needs).

Core GC Algorithms

Mark-and-Sweep: The collector first marks all reachable objects by traversing from GC roots, then sweeps through the heap freeing unmarked objects. Simple, but leaves memory fragmentation — free space is scattered in small gaps.

Mark-and-Compact: Like Mark-and-Sweep, but after marking, surviving objects are compacted to one end of the heap, eliminating fragmentation. More expensive, but allocation becomes a simple pointer bump.

Copying: Used in the Young Generation. Divides space in two; copies live objects to the empty half, then clears the old half entirely. Very fast when most objects are dead, but wastes half the space.

Major Garbage Collectors

| Collector | Young Gen | Old Gen | Best For | |---|---|---|---| | Serial GC | Copying | Mark-Compact | Small apps, single-core | | Parallel GC | Parallel Copying | Parallel Mark-Compact | Throughput-oriented batch jobs | | G1 GC | Evacuation | Mixed Collection | Balanced latency/throughput (default since Java 9) | | ZGC | Concurrent | Concurrent | Ultra-low latency (< 1ms pauses) | | Shenandoah | Concurrent | Concurrent | Low latency (alternative to ZGC) |

G1 GC: The Modern Default

G1 (Garbage-First) breaks the heap into equal-sized regions (typically 1-32 MB each) rather than contiguous generations. Each region is tagged as Eden, Survivor, Old, or Humongous (for objects larger than half a region). G1 maintains a priority queue of regions by "garbage density" — it collects the regions with the most garbage first (hence the name), maximizing reclamation per unit of work.

G1 aims to meet a configurable pause-time target (default 200ms via -XX:MaxGCPauseMillis). It adjusts the number of regions collected per cycle to stay within this target. This makes G1 a "soft real-time" collector.

ZGC and Shenandoah: The Sub-Millisecond Frontier

ZGC (production-ready since Java 15) achieves pause times under 1 millisecond regardless of heap size — even multi-terabyte heaps. It does this using colored pointers (embedding GC metadata in unused bits of 64-bit pointers) and load barriers (small code snippets injected at every pointer load to fix up references concurrently). The trade-off is slightly lower throughput and higher CPU usage.

Exercise 3Multiple Choice
You're building a financial trading platform where GC pause times must stay below 1ms, and the heap may grow to 64GB. Which garbage collector is the best fit?

JIT Compilation and Runtime Optimization

The JIT (Just-In-Time) compiler is arguably the JVM's most impressive subsystem — it's the reason Java can match or exceed the performance of statically compiled languages for long-running applications.

How JIT Compilation Works

When your application starts, bytecode is initially interpreted — the JVM reads each instruction and executes it. This is slow but starts instantly. Meanwhile, the JVM profiles your code, counting method invocations and loop iterations. When a method or loop body exceeds a threshold (typically ~10,000 invocations), it becomes a "hot spot" and is submitted for JIT compilation.

The JVM traditionally has two JIT compilers:

  • C1 (Client Compiler): Compiles quickly with basic optimizations. Used for methods that are warm but not yet hot.
  • C2 (Server Compiler): Takes longer to compile but produces highly optimized native code. Used for the hottest methods.

This is called tiered compilation (enabled by default since Java 8): code progresses through interpretation → C1-compiled → C2-compiled as it gets hotter.

Key JIT Optimizations

Inlining: The most important optimization. The JIT replaces a method call with the method's body directly, eliminating call overhead and enabling further optimizations. Small, frequently called methods benefit enormously.

Escape Analysis: The JIT analyzes whether an object "escapes" the current method (i.e., is returned or stored in a field). If it doesn't escape, the JIT can:

  • Allocate it on the stack instead of the heap (avoiding GC entirely) — called scalar replacement.
  • Eliminate synchronization on it (lock elision), since no other thread can access it.

Loop Unrolling: Reduces loop overhead by duplicating the loop body multiple times per iteration.

Dead Code Elimination: Removes code paths that can never be reached based on runtime profiling data.

Critical insight: JIT optimizations are based on runtime profiling, not static analysis. This means the JIT can make optimizations that a static compiler like gcc cannot — for example, devirtualizing a polymorphic call if profiling shows only one implementation is ever used. However, it also means benchmarks must account for warmup time — the first few thousand iterations of any code path will be interpreted and much slower.

Tuning GC and Diagnosing Memory Issues

Understanding JVM internals isn't just academic — it directly translates into practical skills for tuning production applications and diagnosing memory problems.

Essential JVM Flags

# Choose a collector
-XX:+UseG1GC              # G1 (default Java 9+)
-XX:+UseZGC               # ZGC (Java 15+)
-XX:+UseShenandoahGC      # Shenandoah

# Heap sizing
-Xms4g                    # Initial heap size
-Xmx4g                    # Maximum heap size (set equal to -Xms to avoid resizing)
-XX:NewRatio=2            # Old:Young ratio (Old = 2x Young)

# G1-specific tuning
-XX:MaxGCPauseMillis=100  # Target pause time (ms)
-XX:G1HeapRegionSize=16m  # Region size

# GC logging (Java 9+ unified logging)
-Xlog:gc*:file=gc.log:time,level,tags

Reading GC Logs

GC logs are your primary diagnostic tool. A typical G1 log entry looks like:

[2024-01-15T10:30:45.123+0000] GC(42) Pause Young (Normal) 
  (G1 Evacuation Pause) 1024M->256M(4096M) 12.345ms

This tells you: GC event #42, a Young generation pause, heap went from 1024MB to 256MB (out of 4096MB total), and it took 12.345ms. If you see frequent "Pause Full" events or pauses growing over time, you have a tuning problem.

Common Memory Problems

Memory Leak: Objects remain referenced even though they're logically "dead." Classic causes include:

  • Listeners/callbacks never unregistered
  • Static collections that grow forever
  • ThreadLocal variables not cleaned up in thread pools

GC Thrashing: The JVM spends more time collecting garbage than running your application. This happens when the heap is too small — GC runs constantly but reclaims very little. You'll see the heap usage sawtoothing near the maximum.

Premature Promotion: Objects are promoted to Old Generation before they die, causing more expensive Major GCs. This happens when Survivor spaces are too small or the tenuring threshold is too low.

Diagnostic Tools

  • jstat -gcutil <pid>: Real-time GC statistics (space utilization, GC counts, GC time)
  • jmap -dump:live,format=b,file=heap.hprof <pid>: Capture a heap dump
  • Eclipse MAT / VisualVM: Analyze heap dumps to find memory leaks
  • JFR (Java Flight Recorder): Low-overhead production profiling, including detailed GC telemetry
  • -XX:+HeapDumpOnOutOfMemoryError: Automatically capture a heap dump when OOM occurs — always enable this in production
Exercise 4Fill in the Blank
When the JVM spends the majority of its time running garbage collection and reclaiming very little memory, this condition is known as GC ___.

Key Takeaways

  • The JVM divides memory into thread-private areas (stacks, PC registers) and shared areas (heap, Metaspace), and understanding this layout is essential for diagnosing both memory and concurrency issues.
  • The generational heap model exploits the weak generational hypothesis — most objects die young — making Minor GC in the Young Generation extremely fast by only copying the small number of survivors.
  • Choose your garbage collector based on your application's priorities: Parallel GC for maximum throughput, G1 GC for balanced latency/throughput, and ZGC/Shenandoah for sub-millisecond pause requirements.
  • The JIT compiler with tiered compilation, inlining, and escape analysis can eliminate heap allocations entirely and optimize polymorphic calls — but only after warmup, so always account for this in benchmarks.
  • Always run production JVMs with GC logging enabled, -XX:+HeapDumpOnOutOfMemoryError set, and -Xms equal to -Xmx to avoid heap resizing overhead — these cost almost nothing but save hours of debugging.
Check Your Understanding

The JVM uses a generational heap memory model that divides the heap into distinct regions — typically the Young Generation (with Eden and Survivor spaces) and the Old Generation — each collected differently based on the assumption that most objects are short-lived. Consider a Java web application that handles thousands of HTTP requests per second, where each request creates numerous temporary objects (request parsers, response builders, intermediate string buffers) as well as some long-lived objects (cached database connections, session data, configuration singletons). **Exercise:** Explain step-by-step how such a web application's objects would flow through the generational garbage collection process. In your explanation, describe where newly created request-handling objects are initially allocated, what happens during a Minor GC when Eden fills up, how objects eventually get promoted to the Old Generation, and why this generational approach is more efficient than treating all objects the same. Also, identify which type of GC event (Minor vs. Major/Full) would be more concerning from a latency perspective for this web application, and explain why.

🔒Upgrade to submit written responses and get AI feedback
Go deeper
  • How does the Bootstrap ClassLoader load without Java code?🔒
  • What happens when Metaspace runs out of native memory?🔒
  • How do different GC algorithms compare in latency?🔒
  • Can JIT compilation ever produce worse performance than interpreted?🔒
  • How do weak references interact with garbage collection cycles?🔒