JIT: Add global search option to 3-opt layout #110089

amanasifkhalid · 2024-11-22T18:47:12Z

Follow-up to #103450, part of #107749. This is enabled for now just to get a CI run in; I intend to disable this by default since it's prohibitively expensive to run for most cases. In a follow-up, we can explore turning this on for sufficiently small regions, and/or when we think the current block layout is close to optimal.

dotnet-policy-service · 2024-11-22T18:47:48Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

amanasifkhalid · 2024-11-25T16:57:30Z

/azp run runtime-coreclr outerloop, Fuzzlyn, Antigen

azure-pipelines · 2024-11-25T16:57:55Z

Azure Pipelines successfully started running 3 pipeline(s).

amanasifkhalid · 2024-11-25T20:04:02Z

cc @dotnet/jit-contrib, @AndyAyersMS PTAL. I plan to disable this before merging it in, but the diffs show this is expensive TP-wise, but not nearly as expensive as I initially found it to be (I suspect my first prototype was doing something naive with the cost model computation). I don't think we'll have much trouble enabling this for some production scenarios. Antigen and Fuzzlyn failures are unrelated. Thanks!

AndyAyersMS

LGTM.

What sort of score improvements do you see from this? Maybe surface the score as a jit metric so you can easily compare score changes across many methods?

amanasifkhalid · 2024-11-26T19:11:33Z

One notable source of diffs has to do with the movement of backward jumps. The greedy implementation takes a pseudo 4-opt approach to backward jumps by considering partitioning before the destination block, before the source block, and directly after the source block. The global implementation won't consider such partition shapes, and thus isn't considering plenty of backward jump moves that the greedy implementation does try. For the diffs I looked at, the greedy approach sometimes finds a less costly layout according to the model (though the layout itself looks far less canonical). This divergence in behavior between these approaches raises two concerns for me:

If the greedy variant is sometimes outperforming the global variant in optimizing cost, then the global variant is failing to serve as a benchmark.
If we plan to use the global and greedy variants together (for example, if the region being reordered is sufficiently small to use the global variant), then the decision to use the global variant should only improve the layout (or achieve the same layout) compared to using the greedy version.

I think we should try to resolve this divergence in behavior around loop backedges, though I don't have an ideal solution yet. One thing we can try is to not consider backedges at all in the greedy variant, and introduce a new pass after 3-opt that uses the loop data structures we computed for the RPO traversal to move backedges around, though adding yet another phase (even if it replaces something like fgMoveHotJumps) isn't ideal to me. Another thing we can try is detect if we have loops in the region being reordered, and if so, switch the global variant to do 4-opt. It's probably too expensive to do 4-opt all the time, though maybe we can get away with it for sufficiently small regions?

amanasifkhalid · 2024-11-26T22:57:17Z

What sort of score improvements do you see from this? Maybe surface the score as a jit metric so you can easily compare score changes across many methods?

Here's a distribution of cost improvements (as a percentage of the greedy layout's cost) for churned methods in libraries.pmi:

A negative percentage indicates the global layout produced a worse score. This data suggests the global variant's inability to do 4-opt on backedges is a significant limitation. I don't think there's much utility in merging in a global variant that underperforms this often, so I'm going to try 4-opt in this PR to get a sense of the expense/churn.

amanasifkhalid added 2 commits November 22, 2024 13:37

Add global 3-opt impl

fc4a2ef

Add switch

a145ad2

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 22, 2024

dotnet-policy-service bot assigned amanasifkhalid Nov 22, 2024

amanasifkhalid added 2 commits November 22, 2024 14:04

Make switch available in release

679f0c0

Don't reorder nested try regions

48dc176

amanasifkhalid added 2 commits November 26, 2024 13:44

Skip updating ordinals on last iteration

6ef6c2b

Style

35890a6

AndyAyersMS approved these changes Nov 26, 2024

View reviewed changes

Disable global 3-opt by default

c4ad79e

amanasifkhalid marked this pull request as draft November 26, 2024 22:57

amanasifkhalid added 2 commits November 26, 2024 17:58

Add layout cost metric

fbae168

Try 4-opt

87a8348

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Add global search option to 3-opt layout #110089

JIT: Add global search option to 3-opt layout #110089

amanasifkhalid commented Nov 22, 2024

dotnet-policy-service bot commented Nov 22, 2024

amanasifkhalid commented Nov 25, 2024

azure-pipelines bot commented Nov 25, 2024

amanasifkhalid commented Nov 25, 2024

AndyAyersMS left a comment

amanasifkhalid commented Nov 26, 2024

amanasifkhalid commented Nov 26, 2024

JIT: Add global search option to 3-opt layout #110089

Are you sure you want to change the base?

JIT: Add global search option to 3-opt layout #110089

Conversation

amanasifkhalid commented Nov 22, 2024

dotnet-policy-service bot commented Nov 22, 2024

amanasifkhalid commented Nov 25, 2024

azure-pipelines bot commented Nov 25, 2024

amanasifkhalid commented Nov 25, 2024

AndyAyersMS left a comment

Choose a reason for hiding this comment

amanasifkhalid commented Nov 26, 2024

amanasifkhalid commented Nov 26, 2024