-
Notifications
You must be signed in to change notification settings - Fork 23
Measure impact of allocation free async #32
Comments
I'll be back to doing perf work in a few days and will definitely take a look at this for Npgsql. Any idea whether this works only for .NET Core 2.1 or also for older targets? |
Not sure if it would work with previous versions (we can ask Stephen). Perhaps it would be ok to add public surface that exposes a On the other hand we have the restriction that DbDataReader and friends are part of .NET Standard 2.0, so we need to come up with some way to add APIs that doesn't change that. I am again considering the idea of creating a new package that defines optional interfaces. |
Reading @stephentoub's comment it would seem this has also been added to the System.Threading.Tasks.Extensions nuget, so if I'm understanding correctly this is actually backwards compatible. Even if we don't add/change any surface API, there's potential for reducing allocations internally in the driver. However, it's true that in the end we have |
Sounds good!
I assume this is for prepared statements and those don't do any database I/O until the first read? Then yes, that sounds like a good usage of Anyway, I tend to focus more on the more fine-grained async APIs like ReadAsync and possibly GetFieldValueAsync, because I believe those can pay off more in terms of reducing allocations. |
Well, regardless of whether it's prepared or not, DbCommand.ExecuteReader() has to both write to the server and read the first response. If we can reduce allocations in this process that would be great.
Unlike DbCommand.ExecuteReader(), which always does I/O (always yields), APIs like ReadAsync() and GetFieldValueAsync() don't. In fact, GetFieldValueAsync() absolutely never yields unless CommandBehavior.Sequential is specified - we know the entire row is buffered in memory. So the implementation of ReadAsync() is totally non-async. The same is mostly true of ReadAsync() - in the 99% case, the next row is already buffered in memory and no I/O needs to occur; I/O only needs to occur if reading a really big resultset (> 8K) and we're now at the end of the buffer - that case doesn't seem worth optimizing... So to summarize, I think the finer-grained async APIs are better optimized via a fast-path for when they don't yield, thus avoiding async altogether (I did this when I was visiting). The problem is with the methods which always yield, i.e. ExecuteReader(), and for those I definitely think there's something to do... |
Ah, I misread what you said before. I thought you were saying it can always return immediately, but you said the opposite. If the awaitables can be safely recycled, then it could be a good fit for ValueTask, otherwise it probably isn't.
Yes, I know 😄 but maybe we care enough for the cases in which it yields, otherwise we can deprecate the API 😄
I want to understand this one better. Are you saying that in Npgsql you read the first buffer ahead of the first call to ReadAsync in order to avoid ever having to yield? Otherwise I would expect to still see on average more than one calls to ReadAsync that yield for every call to ExecuteReaderAsync. And of course if the resultset is bigger than 8K (or whatever the size of the buffer is for a given provider), then the ratio will be 2:1 or greater. BTW, I wouldn't say that a resultset that is bigger than 8K is really big
To clarify, the work on allocation free-async being discussed at https://github.com/dotnet/corefx/issues/27445 is actually supplemental to any fast path optimizations. In other words, it assumes that you are already avoiding allocations for the cases in which you don't need to yield, and it provides the cure to the problem of having to allocate a new Task for each case in which you do need to yield. The trick is to return an awaitable that unlike Task, can be reused, e.g. the next time you need to yield. For this reason we are discussing that in the |
Yeah, we discussed this with @anpete at some point with regards to a slightly different question - is it safe to have a single instance of NpgsqlDataReader on a physical connection, and just return that whenever So that reduces one allocation per However, I'm not too enthusiastic about this idea because the optimized pattern in https://github.com/dotnet/corefx/issues/27445 is a bit tricky for consumers of the ValueTask - by its very nature it's an error to await the ValueTask twice, etc. so I'm thinking this would be the source of many user errors - and weighing that against the benefit of a single saved Task allocation, I'm not sure it's worth it. It's also a addition of a new public API, which isn't free either. So I really think this is great for any async happening inside Npgsql - remember that there are still 2-3 async methods being called internally, and probably allocating - but I'm not too sure about exposing this to the user.
It's a good question... So first, although Npgsql buffers entire rows unless in sequential mode, I'm not sure all other providers do... In other words, in theory a provider might still decide to go to the server even without Second, I'm assuming that cases where sequential is specified and where So to summarize, I definitely think
Sure, let me provide more detail... Whenever Npgsql needs more data, at attempts to read as much as possible, up to the buffer size (8K) - so yes, it's a sort of read-ahead buffer. So let's say the user called Now, the important thing is that unless I'm msitaken, PostgreSQL actually doesn't send anything back as a response until it actually has at least one row to send. So when you execute Everything I wrote above does need to be verified - I'll try to take a look and confirm in the coming days. It's true that this would be PostgreSQL-specific behavior, and with other databases the
That's true :) But the point is that you'd have just one yielding invocation every 8K. I think there's a lot of value in eliminating allocations that occur on every invocation, but much less value in eliminating those that happen only when the buffer is exhausted... The gains are likely to be really dominated by processing of the 8K of data at the user, etc.
I agree. The question is more whether invocations yield often enough to make https://github.com/dotnet/corefx/issues/27445 worth it (especially considering that it isn't "free", because of the gotchas in the API)... In the case of In any case it definitely seems like although they're orthogonal, there's much higher value in optimizing the non-yielding invocation (i.e. fast-path) than the yielding invocation, so provider should probably focus on the former first. |
Wow, sorry for the super-long response :) |
I was wondering about this myself, and I have to say: I have never seen any normal code await anything more than once. I am actually optimistic that most developers will never see the difference. Perhaps the risk is being overestimated, but I guess we can wait until others try this and see how confusing it is. For the case of
Cool. I agree the benefit here can be huge.
SqlClient does the same thing. This was originally a requirement from EF, because it allows us to keep our materializer code synchronous and still avoid blocking.
I think it is desirable that this is the case for all providers. However, to the degree that we still care about this API (I am willing to agree that this is debatable, but hey, I think I may be biased towards O/RM scenarios 😄), it should ideally return ValueTask and not Task. This is not just because of the new allocation-free awaitables work. It was already true before, for the simple reason that this is an API that most of the time doesn't yield, but the result can vary a lot and therefore it is not practical to cache the returned
Cool, that is great! I am also not sure about what other providers/databases do.
I agree that the fast path optimization for the cases that don't need to yield can have more value. It is just unfortunate that we are not doing this everywhere already! FWIW, my intention when I filed this issue was to push ourselves to go to the next level. BTW, do you remember when @anpete you and I discussed the |
You can guard against that in the implementation of the IValueTaskSource |
Oh cool, I wasn't aware of this issue. I definitely think that By the same logic, it might make sense to have a
I admit I don't remember any more... I definitely need to redo the memory profiling soon and will report on the results. If we see a Task coming out of ReadAsync(), that would mean it's yielding contrary to what I wrote above, and then a ValueTask-returning overload would mitigate that. I'll try to investigate soon.
OK thanks :) I've yet to dive into the new ValueTask capabilities, will take a look. |
Likewise, in MySqlConnector,
Similarly, |
This is the same situation as in PostgreSQL. Note that the use of ValueTask to reduce memory overhead in this case isn't necessary... async methods automatically return a cached copy of Task if they complete synchronously: there's no advantage in returning ValueTask for that. Npgsql has a fast-path for ReadAsync for those cases where the next message (MySQL packet in your case) is already buffered in memory. Note that this fast-path isn't even an async method, to avoid the CPU overhead of the state machine - it's just a simple To summarize, before https://github.com/dotnet/corefx/issues/27445 having a ReadAsync overload that returns a ValueTask is useless (unless I'm missing something), but it may now be interesting to see whether allocations could be reduced for yielding (asynchronous) invocations.
Yeah, I understand that. Npgsql does impoement In any case, as long as you don't actually implement sequential row access, it makes total sense for |
Note: filed https://github.com/dotnet/corefx/issues/27682 to track adding ValueTask-returning overloads to ADO.NET. |
@roji and @bgrainger I'm super interested in pulling apart the networking layer of both clients to see what it looks like with all of the new primitives we are building (pipelines, span etc) |
@davidfowl and I'd be super super happy to do it with you... I've already thought of some places where spans could be great - although nothing earth-shattering at the moment (I'd be happy to be shown I'm wrong though). We also briefly discussed pipelines together a few weeks back and didn't necessary a huge benefit, although I admit I'm still unclear on what the tech does. How would you to proceed? If you want to dive into the code please take a look at the perf branch which hasn't yet been merged but has lots of important stuff. If you want to do a call or a chat about it just say when. |
@davidfowl What's the status of TLS for pipelines? (I've seen some tweets and merged PRs but I'm not sure if there's, say, a beta-quality NuGet package providing TLS.) MySqlConnector will need at least support for client SSL certificates to move 100% to pipelines (instead of having one path that uses
Ditto. |
Great! Next week is the MVP summit and we're still plowing through a bunch of API changes so I don't recommend you try to jump in using them right away. By preview2, the API churn should be over. I also don't think replacing the code you have with pipelines, span and memory will necessarily make things faster if you have already optimized code paths.
Pipelines themselves don't do much. Kestrel's transport layer is something that would be potentially useful (even though it's tied to the server, it has all of the pieces required to make a client API work). Buffer management is one of the things that pipelines takes care of. Still, if we did anything, it would abstracting the networking so that pipelines could be dropped in. This is what we did with Kestrel and now we have libuv, sockets and in memory transports (great for testing 😄). The other thing we did was do enough work to replace all of the things you have to do to manage buffers efficiently, parse various types of data from buffers directly which hopefully means the buffer types end up potentially being very small.
Lets try to do an initial investigation in ~2-3 weeks.
There is nothing that Microsoft is shipping in this space. SSLStream is still our official story. .NET Core 2.1 has some good performance improvements but there's still more work to be done. @Drawaes is working on a native pipelines implementation of TLS (also a fully managed one) that might be interesting to look at. https://github.com/Drawaes/Leto
Kestrel adapts SSLStream into a pipe. That's trivial to do. |
Sounds great, ping me when it calms down and we'll take a look together.
I suspect that there are some difference between a database client model and a webserver model when it comes to buffering, here are a few notes about how it works in Npgsql:
That's about it... It's a very simple design. The buffers are currently I'm not sure to what extent the above is the same with MySQL but I'm guessing it's not going to be super different. |
MySqlConnector has a very similar model (with a few small differences, e.g., a per-connection read/write buffer, currently no user-exposed way to tune the buffer size, etc.). There's currently a lot of inefficiency in reading data that I'm planning to fix with |
.NET Core 2.1 is going to bring new changes to
ValueTask<T>
that are going to enable it to wrap simple awaitable objects that can be reused across multiple async operations:https://github.com/dotnet/corefx/issues/27445
This is a good fit for things like reading rows asynchronously from a DbDataReader (and all the layers above, including anything that implements
IAsyncEnumerable
).This issue is about trying to do some experiments to measure the potential impact of this. E.g. we could modify Peregrine to leverage this in a completely allocation free async implementation and compare.
Having an idea of the impact can help us prioritize the changes and also decide how soon we should do it: we need to decide on public surface changes. E.g. do we just add a new alternative to ReadAsync that return ValueTask (and similar GetFieldValueAsync) or do we wait until the new
IAsyncEnumerable<out T>
is baked and create something based on it?The text was updated successfully, but these errors were encountered: