-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RSDK-8904: Only update weak dependencies once per reconfigure. #4584
base: main
Are you sure you want to change the base?
Conversation
// Dan: Using the clock in this way will early return the "loser" goroutine. The "loser" | ||
// goroutine will proceed despite weak dependents not yet having their `Reconfigure` | ||
// called. I expect this is actually safe despite being a surprising lack of guarantee. I'm | ||
// concerned about blocking here because I also* expect this logical clock atomic integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be incorrect in the sense that if two components depends on the motion service and are reconfigured at the same time, one might get an updated motion service while the other will early return and thus get a non-updated version.
I don't know if we were excessively concerned about blocking here versus trying to avoid using more locks when introducing the logical clock atomic integer so I would be ok with blocking here. This is basically an implicit dependency that could not be properly expressed. @benjirewis may have more context around the original introduction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one might get an updated motion service while the other will early return and thus get a non-updated version.
I also think this is true unless the first goroutine hols the lastWeakDependentsMu
until the Reconfigure
has actually been called on the motion service. But, is that not just an extension of what @dgottlieb is already saying with
The "loser" goroutine will proceed despite weak dependents not yet having their
Reconfigure
called.
I think if we cared about avoiding that situation we would have to block even more and not r.lastWeakDependsMu.Unlock()
until the end of this function.
I don't know if we were excessively concerned about blocking here versus trying to avoid using more locks when introducing the logical clock atomic integer so I would be ok with blocking here.
I don't have a ton of context around the original introduction of this field, but I agree with, you, @cheukt. Originally, there was likely very little concurrency leveraging this code, so it was probably assumed that calls to updateWeakDependents
and getDependencies
would be sequential. As such, it was probably "good enough" to store a clock value here and assume that the next (sequential) getDependencies
call would not call updateWeakDependents
due to the updated lastWeakDependentsRound
value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough, unless we hold the lock for the entire function and somehow lock for the Reconfiguration of the first component, the second component will never get the "most" up-to-date version of the motion service.
Ok with just leaving the comment there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I buy that this is better than what we have now and I'm also not too worried about the extra blocking.
// Dan: Using the clock in this way will early return the "loser" goroutine. The "loser" | ||
// goroutine will proceed despite weak dependents not yet having their `Reconfigure` | ||
// called. I expect this is actually safe despite being a surprising lack of guarantee. I'm | ||
// concerned about blocking here because I also* expect this logical clock atomic integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one might get an updated motion service while the other will early return and thus get a non-updated version.
I also think this is true unless the first goroutine hols the lastWeakDependentsMu
until the Reconfigure
has actually been called on the motion service. But, is that not just an extension of what @dgottlieb is already saying with
The "loser" goroutine will proceed despite weak dependents not yet having their
Reconfigure
called.
I think if we cared about avoiding that situation we would have to block even more and not r.lastWeakDependsMu.Unlock()
until the end of this function.
I don't know if we were excessively concerned about blocking here versus trying to avoid using more locks when introducing the logical clock atomic integer so I would be ok with blocking here.
I don't have a ton of context around the original introduction of this field, but I agree with, you, @cheukt. Originally, there was likely very little concurrency leveraging this code, so it was probably assumed that calls to updateWeakDependents
and getDependencies
would be sequential. As such, it was probably "good enough" to store a clock value here and assume that the next (sequential) getDependencies
call would not call updateWeakDependents
due to the updated lastWeakDependentsRound
value.
robot/impl/local_robot.go
Outdated
deps, err := r.getDependencies(ctx, resName, resNode) | ||
if r.Logger().Level() == zapcore.DebugLevel { | ||
var depNames []string | ||
for name, _ := range deps { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for name, _ := range deps { | |
for _, name := range deps { |
Probably obvious, but don't do for name := range deps
like lint is suggesting. Does this log line definitely work/look ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention was to get the resource name. deps
is of type Dependencies
:
type Dependencies map[Name]Resource
So getting the key part was intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2024-11-26T20:43:59.119-0500 DEBUG TestCheckMaxInstanceValid impl/local_robot.go:883 handling weak update for resource {"resource":"rdk:service:motion/fake1","deps":["rdk-internal:service:frame_system/builtin","rdk:component:arm/fake2"],"err":null}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Bad news is this seems to be failing implicit/weak dependencies tests. Two I've seen:
Perhaps expected given those tests aren't exercising the concurrency bit, changing the early return to blocking does not seem to help us here. |
No description provided.