The Garbage Collection Handbook

Click chapter to expand sections

	List of Algorithms		xvii
	List of Figures		xxi
	List of Tables		xxv
	Preface		xxvii
	Acknowledgements		xxxiii
	Authors		xxxv
1	Introduction		1
	1.1	Explicit deallocation	2
	1.2	Automatic dynamic memory management	4
	1.3	Comparing garbage collection algorithms	6
		Safety	6
		Throughput and cycles consumed	6
		Completeness and promptness	7
		Pause time and latency	7
		Space overhead	8
		Energy use	8
		Optimisations for specific languages	9
		Scalability and portability	10
	1.4	A performance disadvantage?	10
	1.5	Experimental methodology	11
	1.6	Terminology and notation	12
		The heap	13
		The mutator and the collector	13
		The mutator roots	14
		References, fields and addresses	14
		Liveness, correctness and reachability	15
		Pseudocode	15
		The allocator	15
		Mutator read and write operations	16
		Atomic operations	16
		Sets, multisets, sequences and tuples	17
2	Mark-sweep garbage collection		19
	2.1	The mark-sweep algorithm	20
	2.2	The tricolour abstraction	23
	2.3	Improving mark-sweep	24
	2.4	Bitmap marking	24
	2.5	Lazy sweeping	27
	2.6	Cache misses in the marking loop	29
	2.7	Issues to consider	31
		Mutator overhead	31
		Throughput	32
		Space usage	32
		To move or not to move?	32
3	Mark-compact garbage collection		35
	3.1	Two-finger compaction	36
	3.2	The Lisp 2 algorithm	38
	3.3	Threaded compaction	40
	3.4	One-pass algorithms	42
	3.5	Issues to consider	45
		Is compaction necessary?	45
		Throughput costs of compaction	45
		Long-lived data	45
		Locality	46
		Limitations of mark-compact algorithms	46
4	Copying garbage collection		47
	4.1	Semispace copying collection	47
		Work-list implementations	48
		An example	50
	4.2	Traversal order and locality	50
	4.3	Issues to consider	57
		Allocation	57
		Space and locality	58
		Moving objects	59
5	Reference counting		61
	5.1	Advantages and disadvantages of reference counting	62
	5.2	Improving efficiency	64
	5.3	Deferred reference counting	65
	5.4	Coalesced reference counting	68
	5.5	Cyclic reference counting	71
	5.6	Limited-field reference counting	77
	5.7	Issues to consider	78
		The environment	78
		Advanced solutions	78
6	Comparing garbage collectors		81
	6.1	Throughput	81
	6.2	Pause time and latency	82
	6.3	Space	82
	6.4	Implementation	83
	6.5	Adaptive systems	84
	6.6	A unified theory of garbage collection	85
		Abstract garbage collection	85
		Tracing garbage collection	85
		Reference counting garbage collection	87
7	Allocation		91
	7.1	Sequential allocation	91
	7.2	Free-list allocation	93
		First-fit allocation	93
		Next-fit allocation	93
		Best-fit allocation	94
		Speeding free-list allocation	95
	7.3	Fragmentation	97
	7.4	Segregated-fits allocation	98
		Fragmentation	99
		Populating size classes	100
	7.5	Combining segregated-fits with first-, best- and next-fit	101
	7.6	Additional considerations	101
		Alignment	101
		Size constraints	102
		Boundary tags	102
		Heap parsability	103
		Locality	104
		Wilderness preservation	105
		Crossing maps	105
	7.7	Allocation in concurrent systems	105
	7.8	Issues to consider	106
8	Partitioning the heap		109
	8.1	Terminology	109
	8.2	Why to partition	109
		Partitioning by mobility	110
		Partitioning by size	110
		Partitioning for space	110
		Partitioning by kind	111
		Partitioning for yield	111
		Partitioning for responsiveness	112
		Partitioning for locality	112
		Partitioning by thread	113
		Partitioning by availability	113
		Partitioning by mutability	114
	8.3	How to partition	114
	8.4	When to partition	116
9	Generational garbage collection		117
	9.1	Example	118
	9.2	Measuring time	119
	9.3	Generational hypotheses	119
	9.4	Generations and heap layout	120
	9.5	Multiple generations	121
	9.6	Age recording	122
		En masse promotion	122
		Aging semispaces	124
		Survivor spaces and flexibility	126
	9.7	Adapting to program behaviour	127
		Appel-style garbage collection	127
		Feedback-controlled promotion	129
	9.8	Inter-generational pointers	130
		Remembered sets	130
		Pointer direction	131
	9.9	Space management	132
	9.10	Older-first garbage collection	133
	9.11	Beltway	136
	9.12	Analytic support for generational collection	138
	9.13	Issues to consider	139
	9.14	Abstract generational garbage collection	141
10	Other partitioned schemes		143
	10.1	Large object spaces	143
		The Treadmill garbage collector	144
		Moving objects with operating system support	145
		Pointer-free objects	146
	10.2	Topological collectors	146
		Mature Object Space garbage collection	146
		Connectivity-based garbage collection	149
		Thread-local garbage collection	150
		Stack allocation	155
		Region inferencing	156
	10.3	Hybrid mark-sweep, copying collectors	157
	10.4	Garbage-First: collecting young regions	158
	10.5	Trading space and time	161
	10.6	Mark-region collection: immix	162
	10.7	Copying collection in a constrained memory space	164
	10.8	Bookmarking garbage collection	166
	10.9	Ulterior reference counting	167
	10.10	Issues to consider	168
11	Run-time interface		171
	11.1	Interface to allocation	171
		Speeding allocation	174
		Zeroing	175
	11.2	Finding pointers	176
		Conservative pointer finding	176
		Accurate pointer finding using tagged values	179
		Accurate pointer finding in objects	180
		Accurate pointer finding in global roots	181
		Accurate pointer finding in stacks and registers	182
		Accurate pointer finding in code	192
		Handling interior pointers	193
		Handling derived pointers	194
	11.3	Object tables	195
	11.4	References from external code	196
	11.5	Stack barriers	197
	11.6	GC safe-points and mutator suspension	198
	11.7	Garbage collecting code	201
	11.8	Read and write barriers	202
		Engineering	202
		Precision of write barriers	203
		Hash tables	205
		Sequential store buffers	206
		Overflow action	208
		Cards and card tables	208
		Crossing maps	210
		Summarising cards	212
		Hardware and virtual memory techniques	213
		Write barrier mechanisms: in summary	214
		Chunked lists	214
	11.9	Managing address space	215
	11.10	Applications of virtual memory page protection	217
		Double mapping	217
		Applications of no-access pages	218
	11.11	Choosing heap size	220
	11.12	Issues to consider	222
12	Language-specific concerns		225
	12.1	Finalisation	225
		When do finalisers run?	226
		Which thread runs a finaliser?	227
		Can finalisers run concurrently with each other?	228
		Can finalisers access the object that became unreachable?	228
		When are finalised objects reclaimed?	229
		What happens if there is an error in a finaliser?	229
		Is there any guaranteed order to finalisation?	229
		The finalisation race problem	230
		Finalisers and locks	231
		Finalisation and weak references	231
		Finalisation in particular languages	232
		For further study	233
	12.2	Weak references	233
		Additional motivations	235
		Weak references in Java: the short story	235
		Weak references in Java: the full story	236
		Race in weak pointer clearing	239
		Weak pointers in other languages	240
	12.3	Changing object layout	241
	12.4	Issues to consider	243
13	Concurrency preliminaries		245
	13.1	Hardware	245
		Processors and threads	245
		Interconnect	246
		Memory	247
		Caches	247
		Coherence	247
		Cache coherence performance example: spin locks	248
	13.2	Hardware memory consistency	250
		Fences and happens-before	251
		Consistency models	252
	13.3	Hardware primitives	253
		Compare-and-swap	253
		Load-linked/store-conditionally	254
		Atomic arithmetic primitives	255
		Test then test-and-set	257
		More powerful primitives	257
		Overheads of atomic primitives	258
	13.4	Progress guarantees	259
		Progress guarantees and concurrent collection	260
	13.5	Notation used for concurrent algorithms	261
	13.6	Mutual exclusion	262
	13.7	Work sharing and termination detection	264
		Rendezvous barriers	269
	13.8	Concurrent data structures	269
		Concurrent stacks	272
		Concurrent queue implemented with singly linked list	272
		Concurrent queue implemented with array	276
		Concurrent deque for work stealing	281
	13.9	Transactional memory	282
		Using transactional memory to help implement collection	286
		Supporting transactional memory in the presence of garbage collection	288
	13.10	Issues to consider	289
14	Parallel garbage collection		291
	14.1	Is there sufficient work to parallelise?	292
	14.2	Load balancing	293
	14.3	Synchronisation	294
	14.4	Taxonomy	295
	14.5	Parallel marking	295
	14.6	Parallel copying	304
		Processor-centric techniques	305
		Memory-centric techniques	310
	14.7	Parallel sweeping	315
	14.8	Parallel compaction	316
	14.9	Garbage collection on the GPU?	319
		GPU background	319
		Heap reference graphs	321
		Marking on the GPU	322
		A problem not yet solved	324
	14.10	Issues to consider	324
		Terminology	324
		Is parallel collection worthwhile?	324
		Strategies for balancing loads	325
		Managing tracing	325
		Low-level synchronisation	327
		Sweeping and compaction	327
		Termination	328
15	Concurrent garbage collection		329
	15.1	Correctness of concurrent collection	331
		The tricolour abstraction, revisited	331
		The lost object problem	332
		The strong and weak tricolour invariants	334
		Precision	335
		Mutator colour	335
		Allocation colour	336
		Pointer tagging	336
		Incremental update solutions	336
		Snapshot-at-the-beginning solutions	337
	15.2	Barrier techniques for concurrent collection	337
		Grey mutator techniques	337
		Black mutator techniques	339
		Completeness of barrier techniques	340
		Load barriers	341
		Concurrent write barrier mechanisms	342
		One-level card tables	343
		Two-level card tables	343
		Reducing work	343
		Eliminating read barriers	344
	15.3	Ragged phase changes	344
	15.4	Issues to consider	346
16	Concurrent mark-sweep		349
	16.1	Initialisation	349
	16.2	Termination	351
	16.3	Allocation	352
	16.4	Concurrent marking and sweeping	353
	16.5	Garbage-First: collecting young and old regions	354
	16.6	On-the-fly marking	357
		Write barriers for on-the-fly collection	357
		Doligez-Leroy-Gonthier	358
		Doligez-Leroy-Gonthier for Java	359
		Sliding views	360
	16.7	Abstract concurrent collection	360
		The collector wavefront	361
		Adding origins	363
		Mutator barriers	363
		Precision	363
		Instantiating collectors	364
	16.8	Issues to consider	364
17	Concurrent copying and compaction		367
	17.1	Mostly-concurrent copying	367
		Baker's algorithm	368
		Mostly-concurrent, mostly-copying collection	370
	17.2	Brooks's indirection barrier	370
	17.3	Self-erasing read barriers	371
	17.4	Replication copying	372
		Multi-version copying	372
		Extensions to avoid copy-on-write	374
		Sapphire	376
		Transactional Sapphire	376
		Platinum	383
	17.5	Concurrent compaction	384
		Compressor	384
		Pauseless and C4	386
		Collie	395
		ZGC	396
		Shenandoah	401
		Reducing stop-the-world time in Shenandoah and ZGC	403
	17.6	Issues to consider	404
18	Concurrent reference counting		405
	18.1	LXR: Latency-critical ImmiX with Reference counting	405
	18.2	Simple reference counting revisited	409
	18.3	Biased reference counting	411
	18.4	Buffered reference counting	412
	18.5	Concurrent, cyclic reference counting	413
	18.6	Taking a snapshot of the heap	414
	18.7	Sliding views reference counting	416
		Age-oriented collection	416
		Sliding views cycle reclamation	419
		Memory consistency	420
	18.8	Issues to consider	420
		Safe memory reclamation	421
19	Real-time garbage collection		423
	19.1	Real-time systems	423
	19.2	Scheduling real-time collection	424
	19.3	Work-based real-time collection	425
		Parallel, concurrent replication	425
		Uneven work and its impact on work-based scheduling	432
	19.4	Slack-based real-time collection	434
		Scheduling the collector work	436
		Execution overheads	438
		Programmer input	439
	19.5	Time-based real-time collection: Metronome	439
		Mutator utilisation	439
		Supporting predictability	441
		Analysis	443
		Robustness	447
	19.6	Combining scheduling approaches: Tax-and-Spend	448
		Tax-and-Spend scheduling	448
		Tax-and-Spend prerequisites	450
	19.7	Controlling fragmentation	452
		Incremental compaction in Metronome	452
		Incremental replication on uniprocessors	453
		Stopless: lock-free garbage collection	454
		Staccato: best-effort compaction with mutator wait-freedom	455
		Chicken: best-effort compaction with mutator wait-freedom for x86	458
		Clover: guaranteed compaction with probabilistic mutator lock-freedom	458
		Stopless versus Chicken versus Clover	459
		Fragmented allocation	461
	19.8	Issues to consider	463
20	Energy-aware garbage collection		465
	20.1	Issues to consider	469
21	Persistence and garbage collection		471
	21.1	Persistence: concepts, issues and implementation	471
		Providing durability	472
		Issues raised by persistence	473
	21.2	Impacts of persistence on garbage collection	475
	21.3	Barriers in support of persistence	476
		Software barriers	476
		Hardware barriers	477
		Software versus hardware barriers	478
	21.4	Implemented collectors for persistent heaps	478
		Persistent reference counting	478
		Persistent mark-sweep and mark-compact	479
		Persistent copying	480
	21.5	Issues to consider	481
	Glossary		485
	Bibliography		503
	Index		551

Table of Contents