Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSDK-8904: Only update weak dependencies once per reconfigure. #4584

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dgottlieb
Copy link
Member

No description provided.

@viambot viambot added the safe to test This pull request is marked safe to test from a trusted zone label Nov 25, 2024
// Dan: Using the clock in this way will early return the "loser" goroutine. The "loser"
// goroutine will proceed despite weak dependents not yet having their `Reconfigure`
// called. I expect this is actually safe despite being a surprising lack of guarantee. I'm
// concerned about blocking here because I also* expect this logical clock atomic integer
Copy link
Member

@cheukt cheukt Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be incorrect in the sense that if two components depends on the motion service and are reconfigured at the same time, one might get an updated motion service while the other will early return and thus get a non-updated version.

I don't know if we were excessively concerned about blocking here versus trying to avoid using more locks when introducing the logical clock atomic integer so I would be ok with blocking here. This is basically an implicit dependency that could not be properly expressed. @benjirewis may have more context around the original introduction

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one might get an updated motion service while the other will early return and thus get a non-updated version.

I also think this is true unless the first goroutine hols the lastWeakDependentsMu until the Reconfigure has actually been called on the motion service. But, is that not just an extension of what @dgottlieb is already saying with

The "loser" goroutine will proceed despite weak dependents not yet having their Reconfigure called.

I think if we cared about avoiding that situation we would have to block even more and not r.lastWeakDependsMu.Unlock() until the end of this function.

I don't know if we were excessively concerned about blocking here versus trying to avoid using more locks when introducing the logical clock atomic integer so I would be ok with blocking here.

I don't have a ton of context around the original introduction of this field, but I agree with, you, @cheukt. Originally, there was likely very little concurrency leveraging this code, so it was probably assumed that calls to updateWeakDependents and getDependencies would be sequential. As such, it was probably "good enough" to store a clock value here and assume that the next (sequential) getDependencies call would not call updateWeakDependents due to the updated lastWeakDependentsRound value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough, unless we hold the lock for the entire function and somehow lock for the Reconfiguration of the first component, the second component will never get the "most" up-to-date version of the motion service.

Ok with just leaving the comment there

Copy link
Member

@benjirewis benjirewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I buy that this is better than what we have now and I'm also not too worried about the extra blocking.

// Dan: Using the clock in this way will early return the "loser" goroutine. The "loser"
// goroutine will proceed despite weak dependents not yet having their `Reconfigure`
// called. I expect this is actually safe despite being a surprising lack of guarantee. I'm
// concerned about blocking here because I also* expect this logical clock atomic integer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one might get an updated motion service while the other will early return and thus get a non-updated version.

I also think this is true unless the first goroutine hols the lastWeakDependentsMu until the Reconfigure has actually been called on the motion service. But, is that not just an extension of what @dgottlieb is already saying with

The "loser" goroutine will proceed despite weak dependents not yet having their Reconfigure called.

I think if we cared about avoiding that situation we would have to block even more and not r.lastWeakDependsMu.Unlock() until the end of this function.

I don't know if we were excessively concerned about blocking here versus trying to avoid using more locks when introducing the logical clock atomic integer so I would be ok with blocking here.

I don't have a ton of context around the original introduction of this field, but I agree with, you, @cheukt. Originally, there was likely very little concurrency leveraging this code, so it was probably assumed that calls to updateWeakDependents and getDependencies would be sequential. As such, it was probably "good enough" to store a clock value here and assume that the next (sequential) getDependencies call would not call updateWeakDependents due to the updated lastWeakDependentsRound value.

deps, err := r.getDependencies(ctx, resName, resNode)
if r.Logger().Level() == zapcore.DebugLevel {
var depNames []string
for name, _ := range deps {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for name, _ := range deps {
for _, name := range deps {

Probably obvious, but don't do for name := range deps like lint is suggesting. Does this log line definitely work/look ok?

Copy link
Member Author

@dgottlieb dgottlieb Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention was to get the resource name. deps is of type Dependencies:

type Dependencies map[Name]Resource

So getting the key part was intentional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2024-11-26T20:43:59.119-0500	DEBUG	TestCheckMaxInstanceValid	impl/local_robot.go:883	handling weak update for resource	{"resource":"rdk:service:motion/fake1","deps":["rdk-internal:service:frame_system/builtin","rdk:component:arm/fake2"],"err":null}

Copy link
Member

@cheukt cheukt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dgottlieb
Copy link
Member Author

dgottlieb commented Nov 27, 2024

Bad news is this seems to be failing implicit/weak dependencies tests. Two I've seen:

TestImplicitDepsAcrossModules
TestUpdateWeakDependents

Perhaps expected given those tests aren't exercising the concurrency bit, changing the early return to blocking does not seem to help us here.

@viambot viambot added safe to test This pull request is marked safe to test from a trusted zone and removed safe to test This pull request is marked safe to test from a trusted zone labels Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
safe to test This pull request is marked safe to test from a trusted zone
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants