Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microsoft.NET.Publish.Tests.dll.5 is timing out a lot, imbalance with other partitions #44895

Closed
akoeplinger opened this issue Nov 15, 2024 · 4 comments · Fixed by #44973
Closed
Labels
untriaged Request triage from a team member

Comments

@akoeplinger
Copy link
Member

This workitem is one of the most frequent to time out: #40074

A quick Kusto query shows that we're already pretty close to the default 45mins timeout for a lot of passing runs:

image

This is Windows runs in the last 14 days with the x axis being the execution time. Note that the bucket on the left are runs which had test failures which still count as Status==Pass in Helix terms.

The other partitions of Microsoft.NET.Publish.Tests.dll are all in the 10-15mins range so we probably got unlucky and the long running tests are all in the .5 partition.

Bumping the timeout would be an option or increasing the number of partitions, are there other things we could do?

/cc @marcpopMSFT @baronfel

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged Request triage from a team member label Nov 15, 2024
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet dotnet deleted a comment from dotnet-issue-labeler bot Nov 15, 2024
@marcpopMSFT
Copy link
Member

Thanks for calling this out. We split tests by test class. A recent run had the .5 version executing these test classes:
-class "Microsoft.NET.Publish.Tests.GivenThatWeWantToPublishWithoutConflicts" -class "Microsoft.NET.Publish.Tests.GivenThatWeWantToRunILLink"

The first was 3 tests and took 22 seconds when running locally. The second is 230 tests. I'm still running but the first 60 tests took 9 minutes to run. My recommendation here would be to split the test class up as the fastest solution to this as it should be a fairly simple change to just make them into separate classes.

Another option is we could modify our test infrastructure to split via test method instead. We'd have to go figure out how that code works and then make sure we didn't run into any CLI length limitations as well. I'll poke around a little and see if I can find an easy way to do this.

@agocke any chance you could have someone quickly look into splitting the GivenThatWeWantToRunILLink class up as that's probably the fastest solution here?

@akoeplinger were there any others from your Kusto data that are near the limit as I can try doing a similar analysis there.

@marcpopMSFT
Copy link
Member

FWIW, we could also go delete all of the net 5/6/7 targeting versions of the linker tests as well and make them all 8+.

@akoeplinger
Copy link
Member Author

akoeplinger commented Nov 20, 2024

I ran some more queries and it looks like only the ILLink tests are approaching problematic durations.

Here's the query I used, note that Microsoft.NET.Publish.Tests.dll.7 contains the ILLink tests for macOS/Linux and Microsoft.NET.Publish.Tests.dll.5 for Windows

I opened #44973 to split up GivenThatWeWantToRunILLink

Jobs
| where Repository == "dotnet/sdk" and Queued > ago(14d)
| project JobId
| join kind=inner WorkItems on JobId
| where Status == "Pass"
| extend Duration=Finished-Started
| summarize MedianDuration=percentile(Duration, 50) by FriendlyName,QueueName
| order by MedianDuration desc

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
untriaged Request triage from a team member
Projects
None yet
2 participants