-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to prevent a future sub-graph from running #6221
Comments
At Cylc 7 we only had explicit task "subtraction" [1]. I.E, if you want to skip a chain of tasks or a sub-system within a workflow, you had to group them explicitly e.g. by using a family. These Cylc 7 use cases can all be handled by skip-mode which can match the same behaviour. This issue suggests a new mechanism for implicit task "subtraction" where we determine the tasks to "subtract" by traversing downstream of the selected task(s) [2]. This avoids the need to pre-group tasks for intervention purposes. Presumably for your use cases, it is not possible to pre-empt the chains you want to "subtract" ruling out explicit "subtraction"? In a strange way, this is actually a similar problem to "reflow" in that both are, in effect performing implicit task selection by traversing downstream from selected task(s). Reflow is implicitly determining the tasks to run, whereas implicit "subtraction" is determine the tasks not to run. Implicit "subtraction" shares a similar difficulty with implicit reflow regards downstream consequences. E.G. if we explicitly "subtract" the task
However, if we add this inter-cycle dependency:
Or this intra-cycle dependency:
Then the workflow will subsequently stall as a result of the "subtraction". Cylc doesn't presently have a graph-traversal utility to inform the user what is downstream of a selected task, and most real-world graphs are difficult to inspect graphically, so at present it is very hard for the user to tell whether a "subtraction" operation will cause a subsequent stall or not. So how do we avoid the potential for an unintended future stall? We have talked about termination mechanisms to restrict the scope of reflow, these could also apply here, however, these methods will rely on grouping which defeats the object of implicit task selection.
|
Yes, but I don't entirely agree with the characterization of this as "implicit", although I understand what you mean.
If I explicitly expire That this should cause a stall is predicated on the following:
But I'm countering this with the following:
In fact (1) and (2) could be equally bad - they could both radically delay throughput at a time when I'm not around to fix the problem - by (1) restarting and triggering after a premature shutdown; or (2) removing the expired task causing the unwanted stall. So I'm just saying our response to an intervention - especially when the consequences may occur later in time - should not be entirely based on assuming the user might have been wrong to do it in the first place.
I presume by pre-empt you mean pre-emptively configuring a family that covers the whole sub-graph? Well I suppose it's always possible, but that isn't really good enough. The point is, I might not have done so, for whatever reason. Task families are primarily for inheritance of runtime settings, so it's perfectly reasonable not to think ahead and create additional families just in case they may be needed for particular interventions at run time. During development in particular, it's useful to be able to run or block completely arbitrary sub-graphs. |
Unfortunately we can't know if the user "understands" what they are doing when they perform a potentially dangerous intervention. But despite that, such interventions are sometimes necessary. So I think the best we can do is to warn the user and require explicit opt-in after the warning is issued. (And by the way, even premature shutdown is not that hard to recover from in Cylc 8 - it's certainly not on the same scale as mistaken use of |
You raise a good point there ... posting a new issue: #6237 |
I'm closing this, not because it's invalid, but because the way I put the problem unfortunately created the impression of a competition between "branch cutting" vs "skipping" as ways to prevent a future sub-graph from running. The important thing is really that we need the ability to head off future stalls (due to future final-incomplete tasks) without having to wait unnecessarily for the inevitable stall to actually happen. With that ability, it enables branch cutting as a option where appropriate - but without circumventing our output completion safeguards. I'll put up a new issue for that. |
Superseded by #6383 |
We recently kind-of agreed that:
However the cylc remove extension proposal does not achieve that except for current active tasks:
n<0
) it erases flow history, primarily to allow easy re-run in the same flown=0
) it does chop off the downstream graphn>0
) it does nothing, and punts the problem to skip mode[Aside: it's a pity we didn't call it
cylc erase
]Skip mode is natural for "skipping over" a bunch of tasks that can easily be identified as a group (e.g. a whole cycle, or a family). But it is not so good for the fairly common use case of preventing an arbitrary future side-graph from running downstream of a particular task. By "arbitrary" I mean, in particular, that I may not have foreseen the need for this and so the entire sub-graph was not configured in advance to be in a family expressly to make use of skip mode easy.
Example, in a multi-model workflow external circumstances dictate that I no longer need to run
model-x
in the next cycle, and by implication its entire post-processing and product-generation side-graph. The natural way to do this is to simply force-expirenext/model-x
(expire means: for external reasons we no longer need to run this task).However, this will cause a future stall if we did not have the foresight to set
model-x:expire?
as optional.Here I am deliberately and knowingly chopping off future graph for a good reason, so a future stall is extremely unhelpful. I can't prevent the unwanted stall, I have to wait for it to happen before I can deal with it.
[Note this is not a contrived use case - it is exactly like clock-expire in every respect except that the external reason for expiration is not linked to the clock time - and hence potentially not easily identified as an expire use case before starting the workflow.]
Proposal
A two-step intervention that makes the potential danger clear to the user:
cylc set --out=expire next/model-x
cylc remove --expire next/model-x
- prompt: warning: this will cut the graph off at next/model-x, do you really want to do this?
- has the same effect as removing the task once the future stall has occurred, so the DB must record the removal rather than simply erase the history (which would cause it to run again when the flow reaches it)
The text was updated successfully, but these errors were encountered: