Table of Contents
Click chapter to expand sections
List of Algorithms | xvii | ||
List of Figures | xxi | ||
List of Tables | xxv | ||
Preface | xxvii | ||
Acknowledgements | xxxiii | ||
Authors | xxxv | ||
1 | Introduction | 1 | |
1.1 | Explicit deallocation | 2 | |
1.2 | Automatic dynamic memory management | 4 | |
1.3 | Comparing garbage collection algorithms | 6 | |
Safety | 6 | ||
Throughput and cycles consumed | 6 | ||
Completeness and promptness | 7 | ||
Pause time and latency | 7 | ||
Space overhead | 8 | ||
Energy use | 8 | ||
Optimisations for specific languages | 9 | ||
Scalability and portability | 10 | ||
1.4 | A performance disadvantage? | 10 | |
1.5 | Experimental methodology | 11 | |
1.6 | Terminology and notation | 12 | |
The heap | 13 | ||
The mutator and the collector | 13 | ||
The mutator roots | 14 | ||
References, fields and addresses | 14 | ||
Liveness, correctness and reachability | 15 | ||
Pseudocode | 15 | ||
The allocator | 15 | ||
Mutator read and write operations | 16 | ||
Atomic operations | 16 | ||
Sets, multisets, sequences and tuples | 17 | ||
2 | Mark-sweep garbage collection | 19 | |
2.1 | The mark-sweep algorithm | 20 | |
2.2 | The tricolour abstraction | 23 | |
2.3 | Improving mark-sweep | 24 | |
2.4 | Bitmap marking | 24 | |
2.5 | Lazy sweeping | 27 | |
2.6 | Cache misses in the marking loop | 29 | |
2.7 | Issues to consider | 31 | |
Mutator overhead | 31 | ||
Throughput | 32 | ||
Space usage | 32 | ||
To move or not to move? | 32 | ||
3 | Mark-compact garbage collection | 35 | |
3.1 | Two-finger compaction | 36 | |
3.2 | The Lisp 2 algorithm | 38 | |
3.3 | Threaded compaction | 40 | |
3.4 | One-pass algorithms | 42 | |
3.5 | Issues to consider | 45 | |
Is compaction necessary? | 45 | ||
Throughput costs of compaction | 45 | ||
Long-lived data | 45 | ||
Locality | 46 | ||
Limitations of mark-compact algorithms | 46 | ||
4 | Copying garbage collection | 47 | |
4.1 | Semispace copying collection | 47 | |
Work-list implementations | 48 | ||
An example | 50 | ||
4.2 | Traversal order and locality | 50 | |
4.3 | Issues to consider | 57 | |
Allocation | 57 | ||
Space and locality | 58 | ||
Moving objects | 59 | ||
5 | Reference counting | 61 | |
5.1 | Advantages and disadvantages of reference counting | 62 | |
5.2 | Improving efficiency | 64 | |
5.3 | Deferred reference counting | 65 | |
5.4 | Coalesced reference counting | 68 | |
5.5 | Cyclic reference counting | 71 | |
5.6 | Limited-field reference counting | 77 | |
5.7 | Issues to consider | 78 | |
The environment | 78 | ||
Advanced solutions | 78 | ||
6 | Comparing garbage collectors | 81 | |
6.1 | Throughput | 81 | |
6.2 | Pause time and latency | 82 | |
6.3 | Space | 82 | |
6.4 | Implementation | 83 | |
6.5 | Adaptive systems | 84 | |
6.6 | A unified theory of garbage collection | 85 | |
Abstract garbage collection | 85 | ||
Tracing garbage collection | 85 | ||
Reference counting garbage collection | 87 | ||
7 | Allocation | 91 | |
7.1 | Sequential allocation | 91 | |
7.2 | Free-list allocation | 93 | |
First-fit allocation | 93 | ||
Next-fit allocation | 93 | ||
Best-fit allocation | 94 | ||
Speeding free-list allocation | 95 | ||
7.3 | Fragmentation | 97 | |
7.4 | Segregated-fits allocation | 98 | |
Fragmentation | 99 | ||
Populating size classes | 100 | ||
7.5 | Combining segregated-fits with first-, best- and next-fit | 101 | |
7.6 | Additional considerations | 101 | |
Alignment | 101 | ||
Size constraints | 102 | ||
Boundary tags | 102 | ||
Heap parsability | 103 | ||
Locality | 104 | ||
Wilderness preservation | 105 | ||
Crossing maps | 105 | ||
7.7 | Allocation in concurrent systems | 105 | |
7.8 | Issues to consider | 106 | |
8 | Partitioning the heap | 109 | |
8.1 | Terminology | 109 | |
8.2 | Why to partition | 109 | |
Partitioning by mobility | 110 | ||
Partitioning by size | 110 | ||
Partitioning for space | 110 | ||
Partitioning by kind | 111 | ||
Partitioning for yield | 111 | ||
Partitioning for responsiveness | 112 | ||
Partitioning for locality | 112 | ||
Partitioning by thread | 113 | ||
Partitioning by availability | 113 | ||
Partitioning by mutability | 114 | ||
8.3 | How to partition | 114 | |
8.4 | When to partition | 116 | |
9 | Generational garbage collection | 117 | |
9.1 | Example | 118 | |
9.2 | Measuring time | 119 | |
9.3 | Generational hypotheses | 119 | |
9.4 | Generations and heap layout | 120 | |
9.5 | Multiple generations | 121 | |
9.6 | Age recording | 122 | |
En masse promotion | 122 | ||
Aging semispaces | 124 | ||
Survivor spaces and flexibility | 126 | ||
9.7 | Adapting to program behaviour | 127 | |
Appel-style garbage collection | 127 | ||
Feedback-controlled promotion | 129 | ||
9.8 | Inter-generational pointers | 130 | |
Remembered sets | 130 | ||
Pointer direction | 131 | ||
9.9 | Space management | 132 | |
9.10 | Older-first garbage collection | 133 | |
9.11 | Beltway | 136 | |
9.12 | Analytic support for generational collection | 138 | |
9.13 | Issues to consider | 139 | |
9.14 | Abstract generational garbage collection | 141 | |
10 | Other partitioned schemes | 143 | |
10.1 | Large object spaces | 143 | |
The Treadmill garbage collector | 144 | ||
Moving objects with operating system support | 145 | ||
Pointer-free objects | 146 | ||
10.2 | Topological collectors | 146 | |
Mature Object Space garbage collection | 146 | ||
Connectivity-based garbage collection | 149 | ||
Thread-local garbage collection | 150 | ||
Stack allocation | 155 | ||
Region inferencing | 156 | ||
10.3 | Hybrid mark-sweep, copying collectors | 157 | |
10.4 | Garbage-First: collecting young regions | 158 | |
10.5 | Trading space and time | 161 | |
10.6 | Mark-region collection: immix | 162 | |
10.7 | Copying collection in a constrained memory space | 164 | |
10.8 | Bookmarking garbage collection | 166 | |
10.9 | Ulterior reference counting | 167 | |
10.10 | Issues to consider | 168 | |
11 | Run-time interface | 171 | |
11.1 | Interface to allocation | 171 | |
Speeding allocation | 174 | ||
Zeroing | 175 | ||
11.2 | Finding pointers | 176 | |
Conservative pointer finding | 176 | ||
Accurate pointer finding using tagged values | 179 | ||
Accurate pointer finding in objects | 180 | ||
Accurate pointer finding in global roots | 181 | ||
Accurate pointer finding in stacks and registers | 182 | ||
Accurate pointer finding in code | 192 | ||
Handling interior pointers | 193 | ||
Handling derived pointers | 194 | ||
11.3 | Object tables | 195 | |
11.4 | References from external code | 196 | |
11.5 | Stack barriers | 197 | |
11.6 | GC safe-points and mutator suspension | 198 | |
11.7 | Garbage collecting code | 201 | |
11.8 | Read and write barriers | 202 | |
Engineering | 202 | ||
Precision of write barriers | 203 | ||
Hash tables | 205 | ||
Sequential store buffers | 206 | ||
Overflow action | 208 | ||
Cards and card tables | 208 | ||
Crossing maps | 210 | ||
Summarising cards | 212 | ||
Hardware and virtual memory techniques | 213 | ||
Write barrier mechanisms: in summary | 214 | ||
Chunked lists | 214 | ||
11.9 | Managing address space | 215 | |
11.10 | Applications of virtual memory page protection | 217 | |
Double mapping | 217 | ||
Applications of no-access pages | 218 | ||
11.11 | Choosing heap size | 220 | |
11.12 | Issues to consider | 222 | |
12 | Language-specific concerns | 225 | |
12.1 | Finalisation | 225 | |
When do finalisers run? | 226 | ||
Which thread runs a finaliser? | 227 | ||
Can finalisers run concurrently with each other? | 228 | ||
Can finalisers access the object that became unreachable? | 228 | ||
When are finalised objects reclaimed? | 229 | ||
What happens if there is an error in a finaliser? | 229 | ||
Is there any guaranteed order to finalisation? | 229 | ||
The finalisation race problem | 230 | ||
Finalisers and locks | 231 | ||
Finalisation and weak references | 231 | ||
Finalisation in particular languages | 232 | ||
For further study | 233 | ||
12.2 | Weak references | 233 | |
Additional motivations | 235 | ||
Weak references in Java: the short story | 235 | ||
Weak references in Java: the full story | 236 | ||
Race in weak pointer clearing | 239 | ||
Weak pointers in other languages | 240 | ||
12.3 | Changing object layout | 241 | |
12.4 | Issues to consider | 243 | |
13 | Concurrency preliminaries | 245 | |
13.1 | Hardware | 245 | |
Processors and threads | 245 | ||
Interconnect | 246 | ||
Memory | 247 | ||
Caches | 247 | ||
Coherence | 247 | ||
Cache coherence performance example: spin locks | 248 | ||
13.2 | Hardware memory consistency | 250 | |
Fences and happens-before | 251 | ||
Consistency models | 252 | ||
13.3 | Hardware primitives | 253 | |
Compare-and-swap | 253 | ||
Load-linked/store-conditionally | 254 | ||
Atomic arithmetic primitives | 255 | ||
Test then test-and-set | 257 | ||
More powerful primitives | 257 | ||
Overheads of atomic primitives | 258 | ||
13.4 | Progress guarantees | 259 | |
Progress guarantees and concurrent collection | 260 | ||
13.5 | Notation used for concurrent algorithms | 261 | |
13.6 | Mutual exclusion | 262 | |
13.7 | Work sharing and termination detection | 264 | |
Rendezvous barriers | 269 | ||
13.8 | Concurrent data structures | 269 | |
Concurrent stacks | 272 | ||
Concurrent queue implemented with singly linked list | 272 | ||
Concurrent queue implemented with array | 276 | ||
Concurrent deque for work stealing | 281 | ||
13.9 | Transactional memory | 282 | |
Using transactional memory to help implement collection | 286 | ||
Supporting transactional memory in the presence of garbage collection | 288 | ||
13.10 | Issues to consider | 289 | |
14 | Parallel garbage collection | 291 | |
14.1 | Is there sufficient work to parallelise? | 292 | |
14.2 | Load balancing | 293 | |
14.3 | Synchronisation | 294 | |
14.4 | Taxonomy | 295 | |
14.5 | Parallel marking | 295 | |
14.6 | Parallel copying | 304 | |
Processor-centric techniques | 305 | ||
Memory-centric techniques | 310 | ||
14.7 | Parallel sweeping | 315 | |
14.8 | Parallel compaction | 316 | |
14.9 | Garbage collection on the GPU? | 319 | |
GPU background | 319 | ||
Heap reference graphs | 321 | ||
Marking on the GPU | 322 | ||
A problem not yet solved | 324 | ||
14.10 | Issues to consider | 324 | |
Terminology | 324 | ||
Is parallel collection worthwhile? | 324 | ||
Strategies for balancing loads | 325 | ||
Managing tracing | 325 | ||
Low-level synchronisation | 327 | ||
Sweeping and compaction | 327 | ||
Termination | 328 | ||
15 | Concurrent garbage collection | 329 | |
15.1 | Correctness of concurrent collection | 331 | |
The tricolour abstraction, revisited | 331 | ||
The lost object problem | 332 | ||
The strong and weak tricolour invariants | 334 | ||
Precision | 335 | ||
Mutator colour | 335 | ||
Allocation colour | 336 | ||
Pointer tagging | 336 | ||
Incremental update solutions | 336 | ||
Snapshot-at-the-beginning solutions | 337 | ||
15.2 | Barrier techniques for concurrent collection | 337 | |
Grey mutator techniques | 337 | ||
Black mutator techniques | 339 | ||
Completeness of barrier techniques | 340 | ||
Load barriers | 341 | ||
Concurrent write barrier mechanisms | 342 | ||
One-level card tables | 343 | ||
Two-level card tables | 343 | ||
Reducing work | 343 | ||
Eliminating read barriers | 344 | ||
15.3 | Ragged phase changes | 344 | |
15.4 | Issues to consider | 346 | |
16 | Concurrent mark-sweep | 349 | |
16.1 | Initialisation | 349 | |
16.2 | Termination | 351 | |
16.3 | Allocation | 352 | |
16.4 | Concurrent marking and sweeping | 353 | |
16.5 | Garbage-First: collecting young and old regions | 354 | |
16.6 | On-the-fly marking | 357 | |
Write barriers for on-the-fly collection | 357 | ||
Doligez-Leroy-Gonthier | 358 | ||
Doligez-Leroy-Gonthier for Java | 359 | ||
Sliding views | 360 | ||
16.7 | Abstract concurrent collection | 360 | |
The collector wavefront | 361 | ||
Adding origins | 363 | ||
Mutator barriers | 363 | ||
Precision | 363 | ||
Instantiating collectors | 364 | ||
16.8 | Issues to consider | 364 | |
17 | Concurrent copying and compaction | 367 | |
17.1 | Mostly-concurrent copying | 367 | |
Baker's algorithm | 368 | ||
Mostly-concurrent, mostly-copying collection | 370 | ||
17.2 | Brooks's indirection barrier | 370 | |
17.3 | Self-erasing read barriers | 371 | |
17.4 | Replication copying | 372 | |
Multi-version copying | 372 | ||
Extensions to avoid copy-on-write | 374 | ||
Sapphire | 376 | ||
Transactional Sapphire | 376 | ||
Platinum | 383 | ||
17.5 | Concurrent compaction | 384 | |
Compressor | 384 | ||
Pauseless and C4 | 386 | ||
Collie | 395 | ||
ZGC | 396 | ||
Shenandoah | 401 | ||
Reducing stop-the-world time in Shenandoah and ZGC | 403 | ||
17.6 | Issues to consider | 404 | |
18 | Concurrent reference counting | 405 | |
18.1 | LXR: Latency-critical ImmiX with Reference counting | 405 | |
18.2 | Simple reference counting revisited | 409 | |
18.3 | Biased reference counting | 411 | |
18.4 | Buffered reference counting | 412 | |
18.5 | Concurrent, cyclic reference counting | 413 | |
18.6 | Taking a snapshot of the heap | 414 | |
18.7 | Sliding views reference counting | 416 | |
Age-oriented collection | 416 | ||
Sliding views cycle reclamation | 419 | ||
Memory consistency | 420 | ||
18.8 | Issues to consider | 420 | |
Safe memory reclamation | 421 | ||
19 | Real-time garbage collection | 423 | |
19.1 | Real-time systems | 423 | |
19.2 | Scheduling real-time collection | 424 | |
19.3 | Work-based real-time collection | 425 | |
Parallel, concurrent replication | 425 | ||
Uneven work and its impact on work-based scheduling | 432 | ||
19.4 | Slack-based real-time collection | 434 | |
Scheduling the collector work | 436 | ||
Execution overheads | 438 | ||
Programmer input | 439 | ||
19.5 | Time-based real-time collection: Metronome | 439 | |
Mutator utilisation | 439 | ||
Supporting predictability | 441 | ||
Analysis | 443 | ||
Robustness | 447 | ||
19.6 | Combining scheduling approaches: Tax-and-Spend | 448 | |
Tax-and-Spend scheduling | 448 | ||
Tax-and-Spend prerequisites | 450 | ||
19.7 | Controlling fragmentation | 452 | |
Incremental compaction in Metronome | 452 | ||
Incremental replication on uniprocessors | 453 | ||
Stopless: lock-free garbage collection | 454 | ||
Staccato: best-effort compaction with mutator wait-freedom | 455 | ||
Chicken: best-effort compaction with mutator wait-freedom for x86 | 458 | ||
Clover: guaranteed compaction with probabilistic mutator lock-freedom | 458 | ||
Stopless versus Chicken versus Clover | 459 | ||
Fragmented allocation | 461 | ||
19.8 | Issues to consider | 463 | |
20 | Energy-aware garbage collection | 465 | |
20.1 | Issues to consider | 469 | |
21 | Persistence and garbage collection | 471 | |
21.1 | Persistence: concepts, issues and implementation | 471 | |
Providing durability | 472 | ||
Issues raised by persistence | 473 | ||
21.2 | Impacts of persistence on garbage collection | 475 | |
21.3 | Barriers in support of persistence | 476 | |
Software barriers | 476 | ||
Hardware barriers | 477 | ||
Software versus hardware barriers | 478 | ||
21.4 | Implemented collectors for persistent heaps | 478 | |
Persistent reference counting | 478 | ||
Persistent mark-sweep and mark-compact | 479 | ||
Persistent copying | 480 | ||
21.5 | Issues to consider | 481 | |
Glossary | 485 | ||
Bibliography | 503 | ||
Index | 551 |