Issue with smallest-mailbox in cluster #5980
Unanswered
StephanBis
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We have the following use case: a run which consists of x generations which each have y calculations.
Multiple run actors which receive a run and then start processing its generations sequentially. The next generation can only be calculated whenever the previous has been calculated. The run actors waits for all calculations to return before processing the next generation. A run is finished when all generations have been calculated or when an exception has occured. We use
Tell()
because of the performance benefits.Technically this means we have an Akka cluster running with x run actors running on one VM and y calculation actors running on another VM. The run actors get their runs via a round-robin-pool router. The calculation actors get their calculations via a smallest-mailbox-pool cluster router.
Both actors process their CPU heavy tasks on a seperate thread with the
.ContinueWith()
and.PipeTo()
methods with behaviour switching to stash all incoming messages. As described on many forums including the official example for long running tasks.We had to implement it like this because we were getting a lot of heartbeat warnings and even deadletters. It seems that we were blocking the main thread so the actor could not process cluster gossip or its mailbox.
The issue we are facing is the following: when we use the smallest-mailbox-pool only the first calculation actor is receiving ALL messages. We understand this may be caused by the stashing mechanism. But isn't this weird? This would mean you are never able to use the smallest-mailbox router as it is supposed to be?
We have now switched to a round-robin-pool but this feels wrong as some calculations might run quite some time longer than others. This would mean multiple run actors are waiting for their calculations.
Another question we have is why the mailbox and cluster gossip seem to run on the same thread as the
Receive()
method? Even if a long running task is running on this thread, the mailbox/gossip should not be affected by this in our opinion.We are beginning to question ourselves, it seems that our architecture or implementation is wrong.
This matter is quite hard to explain. Thank you in advance.
Beta Was this translation helpful? Give feedback.
All reactions