-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Scannable#name()
and related logic
#3901
base: main
Are you sure you want to change the base?
Optimize Scannable#name()
and related logic
#3901
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments with context. This PR relates to #3900.
public Boolean measureThroughput() { | ||
return Flux.range(0, rangeSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and in MonoCallableBenchmark
: as stated, these are unrelated improvements. Can be pulled into a separate PR if desired.
@BenchmarkMode({Mode.AverageTime}) | ||
@Warmup(iterations = 5, time = 5, timeUnit = TimeUnit.SECONDS) | ||
@Measurement(iterations = 5, time = 5, timeUnit = TimeUnit.SECONDS) | ||
@Fork(value = 1) | ||
@OutputTimeUnit(TimeUnit.NANOSECONDS) | ||
@State(Scope.Benchmark) | ||
public class TracesBenchmark { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The configuration and setup for this benchmark match those of the existing benchmarks, including explicit specification of default parameters. Can be cleaned up a bit more if desired.
As for the impact of this PR, the following shows the before- and after output of ./gradlew jmh --include="TracesBenchmark" --profilers="gc"
:
Before:
Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units
TracesBenchmark.measureThroughput 0 0 avgt 5 93.082 ± 5.671 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 8607.367 ± 515.275 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 840.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 117.000 counts
TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 153.000 ms
TracesBenchmark.measureThroughput 0 10 avgt 5 398.660 ± 16.663 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 10 avgt 5 7081.158 ± 294.949 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 10 avgt 5 2960.001 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 10 avgt 5 96.000 counts
TracesBenchmark.measureThroughput:gc.time 0 10 avgt 5 124.000 ms
TracesBenchmark.measureThroughput 0 100 avgt 5 2999.075 ± 89.655 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 100 avgt 5 7364.662 ± 220.010 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 100 avgt 5 23160.019 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 100 avgt 5 100.000 counts
TracesBenchmark.measureThroughput:gc.time 0 100 avgt 5 131.000 ms
TracesBenchmark.measureThroughput 0 1000 avgt 5 27916.269 ± 1601.038 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 1000 avgt 5 7751.034 ± 442.501 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1000 avgt 5 226865.228 ± 0.070 B/op
TracesBenchmark.measureThroughput:gc.count 0 1000 avgt 5 105.000 counts
TracesBenchmark.measureThroughput:gc.time 0 1000 avgt 5 133.000 ms
TracesBenchmark.measureThroughput 10 0 avgt 5 402.076 ± 11.693 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 10 0 avgt 5 7476.144 ± 216.947 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 10 0 avgt 5 3152.001 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 10 0 avgt 5 102.000 counts
TracesBenchmark.measureThroughput:gc.time 10 0 avgt 5 131.000 ms
TracesBenchmark.measureThroughput 10 10 avgt 5 626.244 ± 31.574 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 10 10 avgt 5 8053.582 ± 403.074 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 10 10 avgt 5 5288.002 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 10 10 avgt 5 109.000 counts
TracesBenchmark.measureThroughput:gc.time 10 10 avgt 5 140.000 ms
TracesBenchmark.measureThroughput 10 100 avgt 5 3331.927 ± 253.038 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 10 100 avgt 5 7608.535 ± 564.012 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 10 100 avgt 5 26576.019 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 10 100 avgt 5 104.000 counts
TracesBenchmark.measureThroughput:gc.time 10 100 avgt 5 137.000 ms
TracesBenchmark.measureThroughput 10 1000 avgt 5 27321.302 ± 1180.729 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 10 1000 avgt 5 7992.724 ± 345.188 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 10 1000 avgt 5 228969.037 ± 0.044 B/op
TracesBenchmark.measureThroughput:gc.count 10 1000 avgt 5 109.000 counts
TracesBenchmark.measureThroughput:gc.time 10 1000 avgt 5 140.000 ms
TracesBenchmark.measureThroughput 100 0 avgt 5 3119.034 ± 70.765 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 100 0 avgt 5 7555.751 ± 170.859 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 100 0 avgt 5 24712.018 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 100 0 avgt 5 102.000 counts
TracesBenchmark.measureThroughput:gc.time 100 0 avgt 5 131.000 ms
TracesBenchmark.measureThroughput 100 10 avgt 5 3384.598 ± 134.383 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 100 10 avgt 5 7894.374 ± 310.075 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 100 10 avgt 5 28016.020 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 100 10 avgt 5 107.000 counts
TracesBenchmark.measureThroughput:gc.time 100 10 avgt 5 141.000 ms
TracesBenchmark.measureThroughput 100 100 avgt 5 5920.906 ± 192.896 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 100 100 avgt 5 7678.427 ± 250.022 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 100 100 avgt 5 47672.060 ± 0.002 B/op
TracesBenchmark.measureThroughput:gc.count 100 100 avgt 5 104.000 counts
TracesBenchmark.measureThroughput:gc.time 100 100 avgt 5 141.000 ms
TracesBenchmark.measureThroughput 100 1000 avgt 5 28805.156 ± 1329.846 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 100 1000 avgt 5 8211.734 ± 378.784 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 100 1000 avgt 5 248017.124 ± 0.052 B/op
TracesBenchmark.measureThroughput:gc.count 100 1000 avgt 5 111.000 counts
TracesBenchmark.measureThroughput:gc.time 100 1000 avgt 5 147.000 ms
TracesBenchmark.measureThroughput 1000 0 avgt 5 27765.904 ± 1098.007 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1000 0 avgt 5 8341.364 ± 333.465 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1000 0 avgt 5 242849.078 ± 0.042 B/op
TracesBenchmark.measureThroughput:gc.count 1000 0 avgt 5 113.000 counts
TracesBenchmark.measureThroughput:gc.time 1000 0 avgt 5 145.000 ms
TracesBenchmark.measureThroughput 1000 10 avgt 5 28772.385 ± 1965.006 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1000 10 avgt 5 8115.825 ± 550.707 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1000 10 avgt 5 244809.121 ± 0.076 B/op
TracesBenchmark.measureThroughput:gc.count 1000 10 avgt 5 111.000 counts
TracesBenchmark.measureThroughput:gc.time 1000 10 avgt 5 145.000 ms
TracesBenchmark.measureThroughput 1000 100 avgt 5 31094.765 ± 1220.979 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1000 100 avgt 5 8049.444 ± 315.635 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1000 100 avgt 5 262450.032 ± 0.080 B/op
TracesBenchmark.measureThroughput:gc.count 1000 100 avgt 5 109.000 counts
TracesBenchmark.measureThroughput:gc.time 1000 100 avgt 5 142.000 ms
TracesBenchmark.measureThroughput 1000 1000 avgt 5 55008.078 ± 3723.012 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1000 1000 avgt 5 8252.861 ± 557.526 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1000 1000 avgt 5 475940.138 ± 0.280 B/op
TracesBenchmark.measureThroughput:gc.count 1000 1000 avgt 5 112.000 counts
TracesBenchmark.measureThroughput:gc.time 1000 1000 avgt 5 148.000 ms
After
Benchmark (reactorLeadingLines) (trailingLines) Mode Cnt Score Error Units
TracesBenchmark.measureThroughput 0 0 avgt 5 28.283 ± 2.319 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 0 avgt 5 12142.383 ± 996.110 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 0 avgt 5 360.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 0 avgt 5 165.000 counts
TracesBenchmark.measureThroughput:gc.time 0 0 avgt 5 218.000 ms
TracesBenchmark.measureThroughput 0 10 avgt 5 48.606 ± 4.176 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 10 avgt 5 11305.217 ± 943.994 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 10 avgt 5 576.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 10 avgt 5 154.000 counts
TracesBenchmark.measureThroughput:gc.time 0 10 avgt 5 211.000 ms
TracesBenchmark.measureThroughput 0 100 avgt 5 47.248 ± 2.156 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 100 avgt 5 11626.888 ± 524.566 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 100 avgt 5 576.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 100 avgt 5 158.000 counts
TracesBenchmark.measureThroughput:gc.time 0 100 avgt 5 204.000 ms
TracesBenchmark.measureThroughput 0 1000 avgt 5 47.641 ± 4.503 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 0 1000 avgt 5 11535.133 ± 1068.897 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 0 1000 avgt 5 576.002 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 0 1000 avgt 5 157.000 counts
TracesBenchmark.measureThroughput:gc.time 0 1000 avgt 5 199.000 ms
TracesBenchmark.measureThroughput 10 0 avgt 5 47.941 ± 1.134 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 10 0 avgt 5 10821.406 ± 255.277 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 10 0 avgt 5 544.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 10 0 avgt 5 147.000 counts
TracesBenchmark.measureThroughput:gc.time 10 0 avgt 5 182.000 ms
TracesBenchmark.measureThroughput 10 10 avgt 5 48.681 ± 3.065 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 10 10 avgt 5 10658.850 ± 672.502 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 10 10 avgt 5 544.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 10 10 avgt 5 145.000 counts
TracesBenchmark.measureThroughput:gc.time 10 10 avgt 5 189.000 ms
TracesBenchmark.measureThroughput 10 100 avgt 5 48.309 ± 3.576 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 10 100 avgt 5 10741.652 ± 778.802 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 10 100 avgt 5 544.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 10 100 avgt 5 147.000 counts
TracesBenchmark.measureThroughput:gc.time 10 100 avgt 5 202.000 ms
TracesBenchmark.measureThroughput 10 1000 avgt 5 46.582 ± 2.587 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 10 1000 avgt 5 11138.595 ± 611.256 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 10 1000 avgt 5 544.002 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 10 1000 avgt 5 152.000 counts
TracesBenchmark.measureThroughput:gc.time 10 1000 avgt 5 202.000 ms
TracesBenchmark.measureThroughput 100 0 avgt 5 46.127 ± 2.642 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 100 0 avgt 5 11248.700 ± 638.495 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 100 0 avgt 5 544.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 100 0 avgt 5 153.000 counts
TracesBenchmark.measureThroughput:gc.time 100 0 avgt 5 201.000 ms
TracesBenchmark.measureThroughput 100 10 avgt 5 46.186 ± 2.518 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 100 10 avgt 5 11234.035 ± 610.123 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 100 10 avgt 5 544.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 100 10 avgt 5 153.000 counts
TracesBenchmark.measureThroughput:gc.time 100 10 avgt 5 199.000 ms
TracesBenchmark.measureThroughput 100 100 avgt 5 48.736 ± 1.891 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 100 100 avgt 5 10645.285 ± 416.594 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 100 100 avgt 5 544.000 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 100 100 avgt 5 145.000 counts
TracesBenchmark.measureThroughput:gc.time 100 100 avgt 5 185.000 ms
TracesBenchmark.measureThroughput 100 1000 avgt 5 46.312 ± 2.365 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 100 1000 avgt 5 11203.368 ± 573.569 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 100 1000 avgt 5 544.002 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 100 1000 avgt 5 152.000 counts
TracesBenchmark.measureThroughput:gc.time 100 1000 avgt 5 198.000 ms
TracesBenchmark.measureThroughput 1000 0 avgt 5 46.600 ± 2.220 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1000 0 avgt 5 11133.818 ± 527.892 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1000 0 avgt 5 544.002 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1000 0 avgt 5 152.000 counts
TracesBenchmark.measureThroughput:gc.time 1000 0 avgt 5 201.000 ms
TracesBenchmark.measureThroughput 1000 10 avgt 5 45.805 ± 1.806 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1000 10 avgt 5 11326.617 ± 446.037 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1000 10 avgt 5 544.002 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1000 10 avgt 5 154.000 counts
TracesBenchmark.measureThroughput:gc.time 1000 10 avgt 5 206.000 ms
TracesBenchmark.measureThroughput 1000 100 avgt 5 48.450 ± 2.778 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1000 100 avgt 5 10709.195 ± 616.791 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1000 100 avgt 5 544.003 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1000 100 avgt 5 145.000 counts
TracesBenchmark.measureThroughput:gc.time 1000 100 avgt 5 191.000 ms
TracesBenchmark.measureThroughput 1000 1000 avgt 5 51.759 ± 2.299 ns/op
TracesBenchmark.measureThroughput:gc.alloc.rate 1000 1000 avgt 5 10023.884 ± 443.024 MB/sec
TracesBenchmark.measureThroughput:gc.alloc.rate.norm 1000 1000 avgt 5 544.004 ± 0.001 B/op
TracesBenchmark.measureThroughput:gc.count 1000 1000 avgt 5 136.000 counts
TracesBenchmark.measureThroughput:gc.time 1000 1000 avgt 5 174.000 ms
Note how performance of the new implementation is almost independent of the input, while the old implementation scales fairly poorly.
|| stackTraceRow.startsWith("reactor.core.publisher.Mono.onAssembly") | ||
|| stackTraceRow.equals("reactor.core.publisher.Mono.onAssembly") | ||
|| stackTraceRow.equals("reactor.core.publisher.Flux.onAssembly") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line removal is unrelated to the rest of the PR. The dropped line is redundant, as it is preceded by a predicate that matches in strictly more contexts. Question: should the Flux.onAssembly
line (and some others below) be updated to also use startsWith
?
@@ -103,7 +102,7 @@ static String extractOperatorAssemblyInformation(String source) { | |||
} | |||
|
|||
static boolean isUserCode(String line) { | |||
return !line.startsWith("reactor.core.publisher") || line.contains("Test"); | |||
return !line.startsWith(PUBLISHER_PACKAGE_PREFIX) || line.contains("Test"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constant has a trailing dot. Afaik that doesn't matter, as each line should include a class name, too.
static String[] extractOperatorAssemblyInformationParts(String source) { | ||
String[] uncleanTraces = source.split("\n"); | ||
final List<String> traces = Stream.of(uncleanTraces) | ||
.map(String::trim) | ||
.filter(s -> !s.isEmpty()) | ||
.collect(Collectors.toList()); | ||
Iterator<String> traces = trimmedNonemptyLines(source); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The largest improvements in this PR come from these changes.
Key contributors to performance of the old implementation:
String#split
accepts a regular expression. We're no longer performing the comparatively expensive operation of compiling regular expressions.String#split
allocates an array and substrings proportional to the provided input, covering a potentially large part of the input that does not at all influence the result of this method.- The
Stream
operation likewise processes irrelevant lines, and allocates a potentially large list.
The new implementation instead lazily iterates over the input, processing only relevant lines, and tracking only the two most-recently-seen lines.
if (isUserCode(currentLine)) { | ||
// No line is a Reactor API line. | ||
return new String[]{currentLine}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some logic was moved around, but existing comments were relocated with it. This should aid review. It's nice that there was already good test coverage.
else if (i == traces.size()) { | ||
//we skipped ALL lines, meaning they're all reactor API lines. We'll fully display the last one | ||
apiLine = ""; | ||
userCodeLine = traces.get(i-1).replaceFirst("reactor.core.publisher.", ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and below: replaceFirst
also accepts a regex. The new code instead performs a cheaper startsWith
check.
/** | ||
* Returns an iterator over all trimmed non-empty lines in the given source string. | ||
* | ||
* @implNote This implementation attempts to minimize allocations. | ||
*/ | ||
private static Iterator<String> trimmedNonemptyLines(String source) { | ||
return new Iterator<String>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This manually-crafted iterator feels a bit like "coding like it's 1999", but I didn't find a less-verbose alternative that doesn't impact over-all code readability. (Had Guava been on the classpath, then I'd have opted to extend AbstractIterator
.) Open to suggestions!
Thanks for the PR. I will have a closer look. However, bear in mind this change won't make it into 3.7.0-RC1 but would target directly 3.7.0. I think that's ok as it's not an API change. Together with #3900 it is a behaviour change, but unless there were side effects in code they would also not be observed aside from the performance gains. I think it's worth noting some sort of warning for the |
Traces#extractOperatorAssemblyInformationParts
Scannable#name()
and related logic
@chemicL I just realized that the impact of this PR is likely over-stated, as in practice the input stacktrace appears to always be generated by a So perhaps we can optimize the code further by skipping the intermediate single-string representation. I might have time for a closer look into that this weekend. |
There's one other code path:
I now have a POC for this locally. Against a (ReactorDebugAgent-using) benchmark of local code it isn't yet faster that the current PR (surprisingly; requires more investigation), but for the cleanest and likely fastest code, it'd be better if we can have |
Hey, @Stephan202. I've been a bit busy lately but would like to revisit your PRs. Can you give an update on the above considerations? How impactful is this change or an alternative change from your PoC? I assume we would be able to alter the |
Thanks for the ping on this PR @chemicL; I meant to report back here. I did try the alternative approach mentioned (including the I can look into polishing that code and pushing it to an alternative branch for a second opinion; will try to find some time. That said, based on the above, my tentative suggestion would be to proceed with this PR as-is. (Except perhaps for trimming the JMH benchmark inputs, because as mentioned, in practice the code will generally parse at most two stack frames, rather than 1000.) If the alternative approach can be made more performant after all, that can be tackled in a follow-up PR. |
f7ebe1c
to
f470638
Compare
Rebased branch; applied 100% cleanly. |
The experiments I tried are on this messy branch. If desired I can clean it up, though it's a bit TBD when I'll have time to dive back into this topic. |
@Stephan202 I am just trying to understand whether the change is actually needed. As I understand you discovered that this will only be triggered for processing two stack frames (can you point to where it's limited to only 2?), therefore the benchmark is not relevant to the expected usage of this API, yes? And also, there is the risk of touching and changing a stable code base for not much benefit and potential regressions. Is there a possibility that these optimizations will have an actual effect on real world applications? |
@chemicL fair question! Based on earlier testing the answer is "yes, this is an improvement", but let me get back to you in the coming days with some more hard data. (Exact timing TBD.) |
This logic may be executed many times, e.g. if a hot code path uses `{Mono,Flux}#log` or Micrometer instrumentation. The added benchmark shows that for large stack traces the new implementation is several orders of magnitude more efficient in terms of compute and memory resource utilization. While there, improve two existing benchmarks by utilizing the black hole to which benchmark method return values are implicitly sent.
(cherry picked from commit 009ec89)
f470638
to
c3e519a
Compare
@chemicL alright, I rebased the branch on For the Benchmark before the changes
Benchmark after the changes
In short, this means a >3x speedup and more than halving of allocated memory for the most common 2-line case. I also had another look at a "more representative" Picnic-internal benchmark, where some longer reactive chains are subscribed to. I can't easily share this code, but with the Reactor Debug Agent enabled (as we do in production) there's an 18-22% speedup and ~40% reduction in allocated memory: Internal benchmark before the changes
Internal benchmark after the changes
Our main store application is a modular monolith that makes very heavy use of Reactor, with some key request flows creating very long reactive chains. I'm reasonably confident we'll see a noticeable latency improvement with this change. |
Performance is improved in two ways:
Scannable#stepName()
only whenAttr.NAME
is notexplicitly set.
Traces#extractOperatorAssemblyInformationParts
, towhich several
Scannable#stepName()
implementations delegate.The
Scannable#name()
logic may be executed many times, e.g. if a hotcode path uses
{Mono,Flux}#log
or Micrometer instrumentation. Theadded benchmark shows that for large stack traces, the new
Traces
implementation is several orders of magnitude more efficient in terms of
compute and memory resource utilization.
Deferral of invocation of
Scannable#stepName()
assumes that saidmethod does not have side-effects. This is true for all built-in
implementations.
While there, improve two existing benchmarks by utilizing the black hole
to which benchmark method return values are implicitly sent.