Have you ever wondered what actually happens between writing System.out.println("Hello") and seeing text appear on your screen? Beneath the surface of every Java application lies a sophisticated virtual machine that loads your classes, manages memory, optimizes code at runtime, and cleans up after you — all without you writing a single line of memory management code. In this lesson, you'll pull back the curtain on the JVM's architecture, understand how garbage collection really works, and learn how to tune these systems for optimal performance.
The Java Virtual Machine (JVM) is not a single monolithic program — it's a carefully orchestrated collection of subsystems, each with a specific responsibility. Understanding these subsystems is the foundation for everything else in this lesson.
Class Loader Subsystem — Responsible for finding, loading, and verifying .class files. It follows a delegation model: the Bootstrap ClassLoader loads core Java classes (java.lang.*), the Extension ClassLoader (or Platform ClassLoader in Java 9+) loads extension libraries, and the Application ClassLoader loads your application's classes. This hierarchy exists for security — it prevents malicious code from replacing core Java classes.
Runtime Data Areas — These are the memory regions the JVM allocates when it starts. The most important ones are:
Execution Engine — Converts bytecode into machine instructions. It includes the interpreter (executes bytecode line by line), the JIT (Just-In-Time) compiler (compiles hot bytecode paths into native machine code for performance), and the garbage collector.
The heap is shared across all threads, while stacks are thread-private. This distinction is critical for understanding both memory layout and thread safety.
A common misconception is that Java is "slow because it's interpreted." In reality, the JIT compiler identifies hot spots — frequently executed code paths — and compiles them to highly optimized native code, often rivaling C++ performance.
The heap isn't just one big pool of memory — it's carefully divided into regions based on a key empirical observation called the weak generational hypothesis: most objects die young. Think of it like a hospital triage system — new patients (objects) go to a fast-turnover area first, and only those that survive long enough get moved to longer-term care.
The traditional heap layout (used by the default garbage collectors) divides memory into:
Young Generation: Where new objects are allocated. It's further divided into:
Old Generation (Tenured): Home for long-lived objects. GC here is called Major GC (or sometimes Full GC) and is significantly more expensive — it often causes longer application pauses.
-XX:MaxTenuringThreshold, default 15) are promoted to Old Generation.This is extremely efficient because most objects in Eden are already dead by the time GC runs — so the collector only copies a small fraction of objects.
Not all garbage collectors work the same way. The JVM offers several collector implementations, each making different trade-offs between throughput (how much CPU time goes to your application vs. GC), latency (how long GC pauses last), and footprint (how much extra memory the collector needs).
Mark-and-Sweep: The collector first marks all reachable objects by traversing from GC roots, then sweeps through the heap freeing unmarked objects. Simple, but leaves memory fragmentation — free space is scattered in small gaps.
Mark-and-Compact: Like Mark-and-Sweep, but after marking, surviving objects are compacted to one end of the heap, eliminating fragmentation. More expensive, but allocation becomes a simple pointer bump.
Copying: Used in the Young Generation. Divides space in two; copies live objects to the empty half, then clears the old half entirely. Very fast when most objects are dead, but wastes half the space.
| Collector | Young Gen | Old Gen | Best For | |---|---|---|---| | Serial GC | Copying | Mark-Compact | Small apps, single-core | | Parallel GC | Parallel Copying | Parallel Mark-Compact | Throughput-oriented batch jobs | | G1 GC | Evacuation | Mixed Collection | Balanced latency/throughput (default since Java 9) | | ZGC | Concurrent | Concurrent | Ultra-low latency (< 1ms pauses) | | Shenandoah | Concurrent | Concurrent | Low latency (alternative to ZGC) |
G1 (Garbage-First) breaks the heap into equal-sized regions (typically 1-32 MB each) rather than contiguous generations. Each region is tagged as Eden, Survivor, Old, or Humongous (for objects larger than half a region). G1 maintains a priority queue of regions by "garbage density" — it collects the regions with the most garbage first (hence the name), maximizing reclamation per unit of work.
G1 aims to meet a configurable pause-time target (default 200ms via -XX:MaxGCPauseMillis). It adjusts the number of regions collected per cycle to stay within this target. This makes G1 a "soft real-time" collector.
ZGC (production-ready since Java 15) achieves pause times under 1 millisecond regardless of heap size — even multi-terabyte heaps. It does this using colored pointers (embedding GC metadata in unused bits of 64-bit pointers) and load barriers (small code snippets injected at every pointer load to fix up references concurrently). The trade-off is slightly lower throughput and higher CPU usage.
The JIT (Just-In-Time) compiler is arguably the JVM's most impressive subsystem — it's the reason Java can match or exceed the performance of statically compiled languages for long-running applications.
When your application starts, bytecode is initially interpreted — the JVM reads each instruction and executes it. This is slow but starts instantly. Meanwhile, the JVM profiles your code, counting method invocations and loop iterations. When a method or loop body exceeds a threshold (typically ~10,000 invocations), it becomes a "hot spot" and is submitted for JIT compilation.
The JVM traditionally has two JIT compilers:
This is called tiered compilation (enabled by default since Java 8): code progresses through interpretation → C1-compiled → C2-compiled as it gets hotter.
Inlining: The most important optimization. The JIT replaces a method call with the method's body directly, eliminating call overhead and enabling further optimizations. Small, frequently called methods benefit enormously.
Escape Analysis: The JIT analyzes whether an object "escapes" the current method (i.e., is returned or stored in a field). If it doesn't escape, the JIT can:
Loop Unrolling: Reduces loop overhead by duplicating the loop body multiple times per iteration.
Dead Code Elimination: Removes code paths that can never be reached based on runtime profiling data.
Critical insight: JIT optimizations are based on runtime profiling, not static analysis. This means the JIT can make optimizations that a static compiler like
gcccannot — for example, devirtualizing a polymorphic call if profiling shows only one implementation is ever used. However, it also means benchmarks must account for warmup time — the first few thousand iterations of any code path will be interpreted and much slower.
Understanding JVM internals isn't just academic — it directly translates into practical skills for tuning production applications and diagnosing memory problems.
# Choose a collector
-XX:+UseG1GC # G1 (default Java 9+)
-XX:+UseZGC # ZGC (Java 15+)
-XX:+UseShenandoahGC # Shenandoah
# Heap sizing
-Xms4g # Initial heap size
-Xmx4g # Maximum heap size (set equal to -Xms to avoid resizing)
-XX:NewRatio=2 # Old:Young ratio (Old = 2x Young)
# G1-specific tuning
-XX:MaxGCPauseMillis=100 # Target pause time (ms)
-XX:G1HeapRegionSize=16m # Region size
# GC logging (Java 9+ unified logging)
-Xlog:gc*:file=gc.log:time,level,tags
GC logs are your primary diagnostic tool. A typical G1 log entry looks like:
[2024-01-15T10:30:45.123+0000] GC(42) Pause Young (Normal)
(G1 Evacuation Pause) 1024M->256M(4096M) 12.345ms
This tells you: GC event #42, a Young generation pause, heap went from 1024MB to 256MB (out of 4096MB total), and it took 12.345ms. If you see frequent "Pause Full" events or pauses growing over time, you have a tuning problem.
Memory Leak: Objects remain referenced even though they're logically "dead." Classic causes include:
ThreadLocal variables not cleaned up in thread poolsGC Thrashing: The JVM spends more time collecting garbage than running your application. This happens when the heap is too small — GC runs constantly but reclaims very little. You'll see the heap usage sawtoothing near the maximum.
Premature Promotion: Objects are promoted to Old Generation before they die, causing more expensive Major GCs. This happens when Survivor spaces are too small or the tenuring threshold is too low.
jstat -gcutil <pid>: Real-time GC statistics (space utilization, GC counts, GC time)jmap -dump:live,format=b,file=heap.hprof <pid>: Capture a heap dump-XX:+HeapDumpOnOutOfMemoryError: Automatically capture a heap dump when OOM occurs — always enable this in production-XX:+HeapDumpOnOutOfMemoryError set, and -Xms equal to -Xmx to avoid heap resizing overhead — these cost almost nothing but save hours of debugging.The JVM uses a generational heap memory model that divides the heap into distinct regions — typically the Young Generation (with Eden and Survivor spaces) and the Old Generation — each collected differently based on the assumption that most objects are short-lived. Consider a Java web application that handles thousands of HTTP requests per second, where each request creates numerous temporary objects (request parsers, response builders, intermediate string buffers) as well as some long-lived objects (cached database connections, session data, configuration singletons). **Exercise:** Explain step-by-step how such a web application's objects would flow through the generational garbage collection process. In your explanation, describe where newly created request-handling objects are initially allocated, what happens during a Minor GC when Eden fills up, how objects eventually get promoted to the Old Generation, and why this generational approach is more efficient than treating all objects the same. Also, identify which type of GC event (Minor vs. Major/Full) would be more concerning from a latency perspective for this web application, and explain why.