Table of Contents

Click chapter to expand sections

List of Algorithmsxvii
List of Figuresxxi
List of Tablesxxv
Prefacexxvii
Acknowledgementsxxxiii
Authorsxxxv
1Introduction1
1.1Explicit deallocation2
1.2Automatic dynamic memory management4
1.3Comparing garbage collection algorithms6
Safety6
Throughput and cycles consumed6
Completeness and promptness7
Pause time and latency7
Space overhead8
Energy use8
Optimisations for specific languages9
Scalability and portability10
1.4A performance disadvantage?10
1.5Experimental methodology11
1.6Terminology and notation12
The heap13
The mutator and the collector13
The mutator roots14
References, fields and addresses14
Liveness, correctness and reachability15
Pseudocode15
The allocator15
Mutator read and write operations16
Atomic operations16
Sets, multisets, sequences and tuples17
2Mark-sweep garbage collection19
2.1The mark-sweep algorithm20
2.2The tricolour abstraction23
2.3Improving mark-sweep24
2.4Bitmap marking24
2.5Lazy sweeping27
2.6Cache misses in the marking loop29
2.7Issues to consider31
Mutator overhead31
Throughput32
Space usage32
To move or not to move?32
3Mark-compact garbage collection35
3.1Two-finger compaction36
3.2The Lisp 2 algorithm38
3.3Threaded compaction40
3.4One-pass algorithms42
3.5Issues to consider45
Is compaction necessary?45
Throughput costs of compaction45
Long-lived data45
Locality46
Limitations of mark-compact algorithms46
4Copying garbage collection47
4.1Semispace copying collection47
Work-list implementations48
An example50
4.2Traversal order and locality50
4.3Issues to consider57
Allocation57
Space and locality58
Moving objects59
5Reference counting61
5.1Advantages and disadvantages of reference counting62
5.2Improving efficiency64
5.3Deferred reference counting65
5.4Coalesced reference counting68
5.5Cyclic reference counting71
5.6Limited-field reference counting77
5.7Issues to consider78
The environment78
Advanced solutions78
6Comparing garbage collectors81
6.1Throughput81
6.2Pause time and latency82
6.3Space82
6.4Implementation83
6.5Adaptive systems84
6.6A unified theory of garbage collection85
Abstract garbage collection85
Tracing garbage collection85
Reference counting garbage collection87
7Allocation91
7.1Sequential allocation91
7.2Free-list allocation93
First-fit allocation93
Next-fit allocation93
Best-fit allocation94
Speeding free-list allocation95
7.3Fragmentation97
7.4Segregated-fits allocation98
Fragmentation99
Populating size classes100
7.5Combining segregated-fits with first-, best- and next-fit101
7.6Additional considerations101
Alignment101
Size constraints102
Boundary tags102
Heap parsability103
Locality104
Wilderness preservation105
Crossing maps105
7.7Allocation in concurrent systems105
7.8Issues to consider106
8Partitioning the heap109
8.1Terminology109
8.2Why to partition109
Partitioning by mobility110
Partitioning by size110
Partitioning for space110
Partitioning by kind111
Partitioning for yield111
Partitioning for responsiveness112
Partitioning for locality112
Partitioning by thread113
Partitioning by availability113
Partitioning by mutability114
8.3How to partition114
8.4When to partition116
9Generational garbage collection117
9.1Example118
9.2Measuring time119
9.3Generational hypotheses119
9.4Generations and heap layout120
9.5Multiple generations121
9.6Age recording122
En masse promotion122
Aging semispaces124
Survivor spaces and flexibility126
9.7Adapting to program behaviour127
Appel-style garbage collection127
Feedback-controlled promotion129
9.8Inter-generational pointers130
Remembered sets130
Pointer direction131
9.9Space management132
9.10Older-first garbage collection133
9.11Beltway136
9.12Analytic support for generational collection138
9.13Issues to consider139
9.14Abstract generational garbage collection141
10Other partitioned schemes143
10.1Large object spaces143
The Treadmill garbage collector144
Moving objects with operating system support145
Pointer-free objects146
10.2Topological collectors146
Mature Object Space garbage collection146
Connectivity-based garbage collection149
Thread-local garbage collection150
Stack allocation155
Region inferencing156
10.3Hybrid mark-sweep, copying collectors157
10.4Garbage-First: collecting young regions158
10.5Trading space and time161
10.6Mark-region collection: immix162
10.7Copying collection in a constrained memory space164
10.8Bookmarking garbage collection166
10.9Ulterior reference counting167
10.10Issues to consider168
11Run-time interface171
11.1Interface to allocation171
Speeding allocation174
Zeroing175
11.2Finding pointers176
Conservative pointer finding176
Accurate pointer finding using tagged values179
Accurate pointer finding in objects180
Accurate pointer finding in global roots181
Accurate pointer finding in stacks and registers182
Accurate pointer finding in code192
Handling interior pointers193
Handling derived pointers194
11.3Object tables195
11.4References from external code196
11.5Stack barriers197
11.6GC safe-points and mutator suspension198
11.7Garbage collecting code201
11.8Read and write barriers202
Engineering202
Precision of write barriers203
Hash tables205
Sequential store buffers206
Overflow action208
Cards and card tables208
Crossing maps210
Summarising cards212
Hardware and virtual memory techniques213
Write barrier mechanisms: in summary214
Chunked lists214
11.9Managing address space215
11.10Applications of virtual memory page protection217
Double mapping217
Applications of no-access pages218
11.11Choosing heap size220
11.12Issues to consider222
12Language-specific concerns225
12.1Finalisation225
When do finalisers run?226
Which thread runs a finaliser?227
Can finalisers run concurrently with each other?228
Can finalisers access the object that became unreachable?228
When are finalised objects reclaimed?229
What happens if there is an error in a finaliser?229
Is there any guaranteed order to finalisation?229
The finalisation race problem230
Finalisers and locks231
Finalisation and weak references231
Finalisation in particular languages232
For further study233
12.2Weak references233
Additional motivations235
Weak references in Java: the short story235
Weak references in Java: the full story236
Race in weak pointer clearing239
Weak pointers in other languages240
12.3Changing object layout241
12.4Issues to consider243
13Concurrency preliminaries245
13.1Hardware245
Processors and threads245
Interconnect246
Memory247
Caches247
Coherence247
Cache coherence performance example: spin locks248
13.2Hardware memory consistency250
Fences and happens-before251
Consistency models252
13.3Hardware primitives253
Compare-and-swap253
Load-linked/store-conditionally254
Atomic arithmetic primitives255
Test then test-and-set257
More powerful primitives257
Overheads of atomic primitives258
13.4Progress guarantees259
Progress guarantees and concurrent collection260
13.5Notation used for concurrent algorithms261
13.6Mutual exclusion262
13.7Work sharing and termination detection264
Rendezvous barriers269
13.8Concurrent data structures269
Concurrent stacks272
Concurrent queue implemented with singly linked list272
Concurrent queue implemented with array276
Concurrent deque for work stealing281
13.9Transactional memory282
Using transactional memory to help implement collection286
Supporting transactional memory in the presence of garbage collection288
13.10Issues to consider289
14Parallel garbage collection291
14.1Is there sufficient work to parallelise?292
14.2Load balancing293
14.3Synchronisation294
14.4Taxonomy295
14.5Parallel marking295
14.6Parallel copying304
Processor-centric techniques305
Memory-centric techniques310
14.7Parallel sweeping315
14.8Parallel compaction316
14.9Garbage collection on the GPU?319
GPU background319
Heap reference graphs321
Marking on the GPU322
A problem not yet solved324
14.10Issues to consider324
Terminology324
Is parallel collection worthwhile?324
Strategies for balancing loads325
Managing tracing325
Low-level synchronisation327
Sweeping and compaction327
Termination328
15Concurrent garbage collection329
15.1Correctness of concurrent collection331
The tricolour abstraction, revisited331
The lost object problem332
The strong and weak tricolour invariants334
Precision335
Mutator colour335
Allocation colour336
Pointer tagging336
Incremental update solutions336
Snapshot-at-the-beginning solutions337
15.2Barrier techniques for concurrent collection337
Grey mutator techniques337
Black mutator techniques339
Completeness of barrier techniques340
Load barriers341
Concurrent write barrier mechanisms342
One-level card tables343
Two-level card tables343
Reducing work343
Eliminating read barriers344
15.3Ragged phase changes344
15.4Issues to consider346
16Concurrent mark-sweep349
16.1Initialisation349
16.2Termination351
16.3Allocation352
16.4Concurrent marking and sweeping353
16.5Garbage-First: collecting young and old regions354
16.6On-the-fly marking357
Write barriers for on-the-fly collection357
Doligez-Leroy-Gonthier358
Doligez-Leroy-Gonthier for Java359
Sliding views360
16.7Abstract concurrent collection360
The collector wavefront361
Adding origins363
Mutator barriers363
Precision363
Instantiating collectors364
16.8Issues to consider364
17Concurrent copying and compaction367
17.1Mostly-concurrent copying367
Baker's algorithm368
Mostly-concurrent, mostly-copying collection370
17.2Brooks's indirection barrier370
17.3Self-erasing read barriers371
17.4Replication copying372
Multi-version copying372
Extensions to avoid copy-on-write374
Sapphire376
Transactional Sapphire376
Platinum383
17.5Concurrent compaction384
Compressor384
Pauseless and C4386
Collie395
ZGC396
Shenandoah401
Reducing stop-the-world time in Shenandoah and ZGC403
17.6Issues to consider404
18Concurrent reference counting405
18.1LXR: Latency-critical ImmiX with Reference counting405
18.2Simple reference counting revisited409
18.3Biased reference counting411
18.4Buffered reference counting412
18.5Concurrent, cyclic reference counting413
18.6Taking a snapshot of the heap414
18.7Sliding views reference counting416
Age-oriented collection416
Sliding views cycle reclamation419
Memory consistency420
18.8Issues to consider420
Safe memory reclamation421
19Real-time garbage collection423
19.1Real-time systems423
19.2Scheduling real-time collection424
19.3Work-based real-time collection425
Parallel, concurrent replication425
Uneven work and its impact on work-based scheduling432
19.4Slack-based real-time collection434
Scheduling the collector work436
Execution overheads438
Programmer input439
19.5Time-based real-time collection: Metronome439
Mutator utilisation439
Supporting predictability441
Analysis443
Robustness447
19.6Combining scheduling approaches: Tax-and-Spend448
Tax-and-Spend scheduling448
Tax-and-Spend prerequisites450
19.7Controlling fragmentation452
Incremental compaction in Metronome452
Incremental replication on uniprocessors453
Stopless: lock-free garbage collection454
Staccato: best-effort compaction with mutator wait-freedom455
Chicken: best-effort compaction with mutator wait-freedom for x86458
Clover: guaranteed compaction with probabilistic mutator lock-freedom458
Stopless versus Chicken versus Clover459
Fragmented allocation461
19.8Issues to consider463
20Energy-aware garbage collection465
20.1Issues to consider469
21Persistence and garbage collection471
21.1Persistence: concepts, issues and implementation471
Providing durability472
Issues raised by persistence473
21.2Impacts of persistence on garbage collection475
21.3Barriers in support of persistence476
Software barriers476
Hardware barriers477
Software versus hardware barriers478
21.4Implemented collectors for persistent heaps478
Persistent reference counting478
Persistent mark-sweep and mark-compact479
Persistent copying480
21.5Issues to consider481
Glossary485
Bibliography503
Index551