-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RSDK-8904: Only update weak dependencies once per reconfigure. #4584
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -15,6 +15,7 @@ import ( | |||||
"github.com/Masterminds/semver/v3" | ||||||
"github.com/pkg/errors" | ||||||
"go.uber.org/multierr" | ||||||
"go.uber.org/zap/zapcore" | ||||||
packagespb "go.viam.com/api/app/packages/v1" | ||||||
modulepb "go.viam.com/api/module/v1" | ||||||
goutils "go.viam.com/utils" | ||||||
|
@@ -74,6 +75,7 @@ type localRobot struct { | |||||
|
||||||
// lastWeakDependentsRound stores the value of the resource graph's | ||||||
// logical clock when updateWeakDependents was called. | ||||||
lastWeakDependentsMu sync.Mutex | ||||||
lastWeakDependentsRound atomic.Int64 | ||||||
|
||||||
// configRevision stores the revision of the latest config ingested during | ||||||
|
@@ -754,7 +756,23 @@ func (r *localRobot) updateWeakDependents(ctx context.Context) { | |||||
// Track the current value of the resource graph's logical clock. This will | ||||||
// later be used to determine if updateWeakDependents should be called during | ||||||
// getDependencies. | ||||||
r.lastWeakDependentsRound.Store(r.manager.resources.CurrLogicalClockValue()) | ||||||
r.lastWeakDependentsMu.Lock() | ||||||
logicalNow := r.manager.resources.CurrLogicalClockValue() | ||||||
if r.lastWeakDependentsRound.Load() < logicalNow { | ||||||
// RSDK-8904: This method can be called concurrently such that resources are `Reconfigure`d | ||||||
// in parallel. This logical clock existed for the purpose of avoiding double-reconfigure | ||||||
// inside `updateWeakDependents` for the same robot `Reconfigure` call. | ||||||
// | ||||||
// Dan: Using the clock in this way will early return the "loser" goroutine. The "loser" | ||||||
// goroutine will proceed despite weak dependents not yet having their `Reconfigure` | ||||||
// called. I expect this is actually safe despite being a surprising lack of guarantee. I'm | ||||||
// concerned about blocking here because I also* expect this logical clock atomic integer | ||||||
// was introduced to avoid blocking. | ||||||
r.lastWeakDependentsMu.Unlock() | ||||||
return | ||||||
} | ||||||
r.lastWeakDependentsRound.Store(logicalNow) | ||||||
r.lastWeakDependentsMu.Unlock() | ||||||
|
||||||
allResources := map[resource.Name]resource.Resource{} | ||||||
internalResources := map[resource.Name]resource.Resource{} | ||||||
|
@@ -856,8 +874,14 @@ func (r *localRobot) updateWeakDependents(ctx context.Context) { | |||||
if len(r.getWeakDependencyMatchers(conf.API, conf.Model)) == 0 { | ||||||
return | ||||||
} | ||||||
r.Logger().CDebugw(ctx, "handling weak update for resource", "resource", resName) | ||||||
deps, err := r.getDependencies(ctx, resName, resNode) | ||||||
if r.Logger().Level() == zapcore.DebugLevel { | ||||||
var depNames []string | ||||||
for name, _ := range deps { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Probably obvious, but don't do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The intention was to get the resource name.
So getting the key part was intentional. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||
depNames = append(depNames, name.String()) | ||||||
} | ||||||
r.Logger().CDebugw(ctx, "handling weak update for resource", "resource", resName, "deps", depNames, "err", err) | ||||||
} | ||||||
if err != nil { | ||||||
r.Logger().CErrorw(ctx, "failed to get dependencies during weak dependencies update; skipping", "resource", resName, "error", err) | ||||||
return | ||||||
|
@@ -893,6 +917,7 @@ func (r *localRobot) updateWeakDependents(ctx context.Context) { | |||||
resChan <- struct{}{} | ||||||
r.reconfigureWorkers.Done() | ||||||
}() | ||||||
|
||||||
updateResourceWeakDependents(ctxWithTimeout, conf) | ||||||
}) | ||||||
select { | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be incorrect in the sense that if two components depends on the motion service and are reconfigured at the same time, one might get an updated motion service while the other will early return and thus get a non-updated version.
I don't know if we were excessively concerned about blocking here versus trying to avoid using more locks when introducing the logical clock atomic integer so I would be ok with blocking here. This is basically an implicit dependency that could not be properly expressed. @benjirewis may have more context around the original introduction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this is true unless the first goroutine hols the
lastWeakDependentsMu
until theReconfigure
has actually been called on the motion service. But, is that not just an extension of what @dgottlieb is already saying withI think if we cared about avoiding that situation we would have to block even more and not
r.lastWeakDependsMu.Unlock()
until the end of this function.I don't have a ton of context around the original introduction of this field, but I agree with, you, @cheukt. Originally, there was likely very little concurrency leveraging this code, so it was probably assumed that calls to
updateWeakDependents
andgetDependencies
would be sequential. As such, it was probably "good enough" to store a clock value here and assume that the next (sequential)getDependencies
call would not callupdateWeakDependents
due to the updatedlastWeakDependentsRound
value.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough, unless we hold the lock for the entire function and somehow lock for the Reconfiguration of the first component, the second component will never get the "most" up-to-date version of the motion service.
Ok with just leaving the comment there