The Garbage Collection Handbook
The art of automatic memory management
Richard Jones, Antony Hosking, Eliot Moss
Chapman and Hall/CRC, 978-1420082791

Table of Contents

List of Algorithmsxv
List of Figuresxix
List of Tablesxxi
Prefacexxiii
Acknowledgementsxxvii
Authorsxxix
1Introduction1
1.1Explicit deallocation2
1.2Automatic dynamic memory management3
1.3Comparing garbage collection algorithms5
Safety6
Throughput6
Completeness and promptness6
Pause time7
Space overhead8
Optimisations for specific languages8
Scalability and portability9
1.4A performance disadvantage?9
1.5Experimental methodology10
1.6Terminology and notation11
The heap11
The mutator and the collector12
The mutator roots12
References, fields and addresses13
Liveness, correctness and reachability13
Pseudo-code14
The allocator14
Mutator read and write operations14
Atomic operations15
Sets, multisets, sequences and tuples15
2Mark-sweep garbage collection17
2.1The mark-sweep algorithm18
2.2The tricolour abstraction20
2.3Improving mark-sweep21
2.4Bitmap marking22
2.5Lazy sweeping24
2.6Cache misses in the marking loop27
2.7Issues to consider29
Mutator overhead29
Throughput29
Space usage29
To move or not to move?30
3Mark-compact garbage collection31
3.1Two-finger compaction32
3.2The Lisp 2 algorithm34
3.3Threaded compaction36
3.4One-pass algorithms38
3.5Issues to consider40
Is compaction necessary?40
Throughput costs of compaction41
Long-lived data41
Locality41
Limitations of mark-compact algorithms42
4Copying garbage collection43
4.1Semispace copying collection43
Work list implementations44
An example46
4.2Traversal order and locality46
4.3Issues to consider53
Allocation53
Space and locality54
Moving objects55
5Reference counting57
5.1Advantages and disadvantages of reference counting58
5.2Improving efficiency60
5.3Deferred reference counting61
5.4Coalesced reference counting63
5.5Cyclic reference counting66
5.6Limited-field reference counting72
5.7Issues to consider73
The environment73
Advanced solutions74
6Comparing garbage collectors77
6.1Throughput77
6.2Pause time78
6.3Space78
6.4Implementation79
6.5Adaptive systems80
6.6A unified theory of garbage collection80
Abstract garbage collection81
Tracing garbage collection81
Reference counting garbage collection82
7Allocation87
7.1Sequential allocation87
7.2Free-list allocation88
First-fit allocation89
Next-fit allocation90
Best-fit allocation90
Speeding free-list allocation92
7.3Fragmentation93
7.4Segregated-fits allocation93
Fragmentation95
Populating size classes95
7.5Combining segregated-fits with first-, best- and next-fit96
7.6Additional considerations97
Alignment97
Size constraints98
Boundary tags98
Heap parsability98
Locality100
Wilderness preservation100
Crossing maps101
7.7Allocation in concurrent systems101
7.8Issues to consider102
8Partitioning the heap103
8.1Terminology103
8.2Why to partition103
Partitioning by mobility104
Partitioning by size104
Partitioning for space104
Partitioning by kind105
Partitioning for yield105
Partitioning to reduce pause time106
Partitioning for locality106
Partitioning by thread107
Partitioning by availability107
Partitioning by mutability108
8.3How to partition108
8.4When to partition109
9Generational garbage collection111
9.1Example112
9.2Measuring time113
9.3Generational hypotheses113
9.4Generations and heap layout114
9.5Multiple generations115
9.6Age recording116
En masse promotion116
Aging semispaces116
Survivor spaces and flexibility119
9.7Adapting to program behaviour121
Appel-style garbage collection121
Feedback controlled promotion123
9.8Inter-generational pointers123
Remembered sets124
Pointer direction125
9.9Space management126
9.10Older-first garbage collection127
9.11Beltway130
9.12Analytic support for generational collection132
9.13Issues to consider133
9.14Abstract generational garbage collection134
10Other partitioned schemes137
10.1Large object spaces137
The Treadmill garbage collector138
Moving objects with operating system support139
Pointer-free objects140
10.2Topological collectors140
Mature object space garbage collection140
Connectivity-based garbage collection143
Thread-local garbage collection144
Stack allocation147
Region inferencing148
10.3Hybrid mark-sweep, copying collectors149
Garbage-First150
Immix and others151
Copying collection in a constrained memory space154
10.4Bookmarking garbage collection156
10.5Ulterior reference counting157
10.6Issues to consider158
11Run-time interface161
11.1Interface to allocation161
Speeding allocation164
Zeroing165
11.2Finding pointers166
Conservative pointer finding166
Accurate pointer finding using tagged values168
Accurate pointer finding in objects169
Accurate pointer finding in global roots171
Accurate pointer finding in stacks and registers171
Accurate pointer finding in code181
Handling interior pointers182
Handling derived pointers183
11.3Object tables184
11.4References from external code185
11.5Stack barriers186
11.6GC-safe points and mutator suspension187
11.7Garbage collecting code190
11.8Read and write barriers191
Engineering191
Precision of write barriers192
Hash tables194
Sequential store buffers195
Overflow action196
Card tables197
Crossing maps199
Summarising cards201
Hardware and virtual memory techniques202
Write barrier mechanisms: in summary202
Chunked lists203
11.9Managing address space203
11.10Applications of virtual memory page protection205
Double mapping206
Applications of no-access pages206
11.11Choosing heap size208
11.12Issues to consider210
12Language-specific concerns213
12.1Finalisation213
When do finalisers run?214
Which thread runs a finaliser?215
Can finalisers run concurrently with each other?216
Can finalisers access the object that became unreachable?216
When are finalised objects reclaimed?216
What happens if there is an error in a finaliser?217
Is there any guaranteed order to finalisation?217
The finalisation race problem218
Finalisers and locks219
Finalisation in particular languages219
For further study221
12.2Weak references221
Additional motivations222
Supporting multiple pointer strengths223
Using Phantom objects to control finalisation order225
Race in weak pointer clearing226
Notification of weak pointer clearing226
Weak pointers in other languages226
12.3Issues to consider228
13Concurrency preliminaries229
13.1Hardware229
Processors and threads229
Interconnect230
Memory231
Caches231
Coherence232
Cache coherence performance example: spin locks232
13.2Hardware memory consistency234
Fences and happens-before236
Consistency models236
13.3Hardware primitives237
Compare-and-swap237
Load-linked/store-conditionally238
Atomic arithmetic primitives240
Test then test-and-set240
More powerful primitives240
Overheads of atomic primitives242
13.4Progress guarantees243
Progress guarantees and concurrent collection244
13.5Notation used for concurrent algorithms245
13.6Mutual exclusion246
13.7Work sharing and termination detection248
Rendezvous barriers251
13.8Concurrent data structures253
Concurrent stacks256
Concurrent queue implemented with singly linked list256
Concurrent queue implemented with array261
A concurrent deque for work stealing267
13.9Transactional memory267
What is transactional memory?267
Using transactional memory to help implement collection270
Supporting transactional memory in the presence of garbage collection272
13.10Issues to consider273
14Parallel garbage collection275
14.1Is there sufficient work to parallelise?276
14.2Load balancing277
14.3Synchronisation278
14.4Taxonomy279
14.5Parallel marking279
Processor-centric techniques280
14.6Parallel copying289
Processor-centric techniques289
Memory-centric techniques294
14.7Parallel sweeping299
14.8Parallel compaction299
14.9Issues to consider302
Terminology302
Is parallel collection worthwhile?303
Strategies for balancing loads303
Managing tracing303
Low-level synchronisation305
Sweeping and compaction305
Termination306
15Concurrent garbage collection307
15.1Correctness of concurrent collection309
The tricolour abstraction, revisited309
The lost object problem310
The strong and weak tricolour invariants312
Precision313
Mutator colour313
Allocation colour314
Incremental update solutions314
Snapshot-at-the-beginning solutions314
15.2Barrier techniques for concurrent collection315
Grey mutator techniques315
Black mutator techniques317
Completeness of barrier techniques317
Concurrent write barrier mechanisms318
One-level card tables319
Two-level card tables319
Reducing work320
15.3Issues to consider321
16Concurrent mark-sweep323
16.1Initialisation323
16.2Termination324
16.3Allocation325
16.4Concurrent marking and sweeping326
16.5On-the-fly marking328
Write barriers for on-the-fly collection328
Doligez-Leroy-Gonthier 329
Doligez-Leroy-Gonthier for Java330
Sliding views331
16.6Abstract concurrent collection331
The collector wavefront334
Adding origins334
Mutator barriers334
Precision334
Instantiating collectors335
16.7Issues to consider335
17Concurrent copying & compaction 337
17.1Mostly-concurrent copying: Baker's algorithm337
Mostly-concurrent, mostly-copying collection338
17.2Brooks's indirection barrier340
17.3Self-erasing read barriers340
17.4Replication copying341
17.5Multi-version copying342
Extensions to avoid copy-on-write344
17.6Sapphire345
Collector phases346
Merging phases351
Volatile fields351
17.7Concurrent compaction351
Compressor352
Pauseless355
17.8Issues to consider361
18Concurrent reference counting363
18.1Simple reference counting revisited363
18.2Buffered reference counting366
18.3Concurrent, cyclic reference counting366
18.4Taking a snapshot of the heap368
18.5Sliding views reference counting369
Age-oriented collection370
The algorithm370
Sliding views cycle reclamation372
Memory consistency373
18.6Issues to consider374
19Real-time garbage collection375
19.1Real-time systems375
19.2Scheduling real-time collection376
19.3Work-based real-time collection377
Parallel, concurrent replication377
Uneven work and its impact on work-based scheduling384
19.4Slack-based real-time collection386
Scheduling the collector work389
Execution overheads390
Programmer input391
19.5Time-based real-time collection: Metronome391
Mutator utilisation391
Supporting predictability393
Analysis395
Robustness399
19.6Combining scheduling approaches: Tax-and-Spend399
Tax-and-Spend scheduling400
Tax-and-Spend prerequisites401
19.7Controlling fragmentation403
Incremental compaction in Metronome404
Incremental replication on uniprocessors405
Stopless: lock-free garbage collection406
Staccato: best-effort compaction with mutator wait-freedom407
Chicken: best-effort compaction with mutator wait-freedom for x86410
Clover: guaranteed compaction with probabilistic mutator lock-freedom410
Stopless versus Chicken versus Clover412
Fragmented allocation412
19.8Issues to consider415
Glossary417
Bibliography429
Index463