Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Add global search option to 3-opt layout #110089

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

amanasifkhalid
Copy link
Member

Follow-up to #103450, part of #107749. This is enabled for now just to get a CI run in; I intend to disable this by default since it's prohibitively expensive to run for most cases. In a follow-up, we can explore turning this on for sufficiently small regions, and/or when we think the current block layout is close to optimal.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 22, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@amanasifkhalid
Copy link
Member Author

/azp run runtime-coreclr outerloop, Fuzzlyn, Antigen

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@amanasifkhalid
Copy link
Member Author

cc @dotnet/jit-contrib, @AndyAyersMS PTAL. I plan to disable this before merging it in, but the diffs show this is expensive TP-wise, but not nearly as expensive as I initially found it to be (I suspect my first prototype was doing something naive with the cost model computation). I don't think we'll have much trouble enabling this for some production scenarios. Antigen and Fuzzlyn failures are unrelated. Thanks!

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

What sort of score improvements do you see from this? Maybe surface the score as a jit metric so you can easily compare score changes across many methods?

@amanasifkhalid
Copy link
Member Author

One notable source of diffs has to do with the movement of backward jumps. The greedy implementation takes a pseudo 4-opt approach to backward jumps by considering partitioning before the destination block, before the source block, and directly after the source block. The global implementation won't consider such partition shapes, and thus isn't considering plenty of backward jump moves that the greedy implementation does try. For the diffs I looked at, the greedy approach sometimes finds a less costly layout according to the model (though the layout itself looks far less canonical). This divergence in behavior between these approaches raises two concerns for me:

  • If the greedy variant is sometimes outperforming the global variant in optimizing cost, then the global variant is failing to serve as a benchmark.
  • If we plan to use the global and greedy variants together (for example, if the region being reordered is sufficiently small to use the global variant), then the decision to use the global variant should only improve the layout (or achieve the same layout) compared to using the greedy version.

I think we should try to resolve this divergence in behavior around loop backedges, though I don't have an ideal solution yet. One thing we can try is to not consider backedges at all in the greedy variant, and introduce a new pass after 3-opt that uses the loop data structures we computed for the RPO traversal to move backedges around, though adding yet another phase (even if it replaces something like fgMoveHotJumps) isn't ideal to me. Another thing we can try is detect if we have loops in the region being reordered, and if so, switch the global variant to do 4-opt. It's probably too expensive to do 4-opt all the time, though maybe we can get away with it for sufficiently small regions?

@amanasifkhalid
Copy link
Member Author

What sort of score improvements do you see from this? Maybe surface the score as a jit metric so you can easily compare score changes across many methods?

Here's a distribution of cost improvements (as a percentage of the greedy layout's cost) for churned methods in libraries.pmi:
image

A negative percentage indicates the global layout produced a worse score. This data suggests the global variant's inability to do 4-opt on backedges is a significant limitation. I don't think there's much utility in merging in a global variant that underperforms this often, so I'm going to try 4-opt in this PR to get a sense of the expense/churn.

@amanasifkhalid amanasifkhalid marked this pull request as draft November 26, 2024 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants