-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Microsoft.NET.Publish.Tests.dll.5 is timing out a lot, imbalance with other partitions #44895
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Thanks for calling this out. We split tests by test class. A recent run had the .5 version executing these test classes: The first was 3 tests and took 22 seconds when running locally. The second is 230 tests. I'm still running but the first 60 tests took 9 minutes to run. My recommendation here would be to split the test class up as the fastest solution to this as it should be a fairly simple change to just make them into separate classes. Another option is we could modify our test infrastructure to split via test method instead. We'd have to go figure out how that code works and then make sure we didn't run into any CLI length limitations as well. I'll poke around a little and see if I can find an easy way to do this. @agocke any chance you could have someone quickly look into splitting the GivenThatWeWantToRunILLink class up as that's probably the fastest solution here? @akoeplinger were there any others from your Kusto data that are near the limit as I can try doing a similar analysis there. |
FWIW, we could also go delete all of the net 5/6/7 targeting versions of the linker tests as well and make them all 8+. |
I ran some more queries and it looks like only the ILLink tests are approaching problematic durations. Here's the query I used, note that I opened #44973 to split up GivenThatWeWantToRunILLink Jobs
| where Repository == "dotnet/sdk" and Queued > ago(14d)
| project JobId
| join kind=inner WorkItems on JobId
| where Status == "Pass"
| extend Duration=Finished-Started
| summarize MedianDuration=percentile(Duration, 50) by FriendlyName,QueueName
| order by MedianDuration desc |
…n run them separately Fixes #44895
This workitem is one of the most frequent to time out: #40074
A quick Kusto query shows that we're already pretty close to the default 45mins timeout for a lot of passing runs:
This is Windows runs in the last 14 days with the x axis being the execution time. Note that the bucket on the left are runs which had test failures which still count as Status==Pass in Helix terms.
The other partitions of Microsoft.NET.Publish.Tests.dll are all in the 10-15mins range so we probably got unlucky and the long running tests are all in the .5 partition.
Bumping the timeout would be an option or increasing the number of partitions, are there other things we could do?
/cc @marcpopMSFT @baronfel
The text was updated successfully, but these errors were encountered: