Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource template logs for spark application don't get archived in artifact repo #9900

Open
2 of 3 tasks
Freia3 opened this issue Oct 24, 2022 · 8 comments · May be fixed by #13933
Open
2 of 3 tasks

Resource template logs for spark application don't get archived in artifact repo #9900

Freia3 opened this issue Oct 24, 2022 · 8 comments · May be fixed by #13933
Labels

Comments

@Freia3
Copy link

Freia3 commented Oct 24, 2022

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

I have an Argo Workflow running a Spark application (using the spark-operator). I want to archive the logs of this workflow in an artifact repository, but this does not work.

When running the hello-world workflow, the logs do get archived.
yaml files to reproduce: https://github.com/Freia3/argo-spark-example

Version

v3.4.2

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: spark-kubernetes-dag
  namespace: freia
spec:
  entrypoint: sparkling-operator
  serviceAccountName: argo-spark
  templates:
  - name: sparkpi
    resource: 
      action: create 
      successCondition: status.applicationState.state in (COMPLETED)
      failureCondition: 'status.applicationState.state in (FAILED, SUBMISSION_FAILED, UNKNOWN)'
      manifest: | 
        apiVersion: "sparkoperator.k8s.io/v1beta2"
        kind: SparkApplication
        metadata:
          generateName: spark-pi
          namespace: freia
        spec:
          type: Scala
          mode: cluster
          image: "gcr.io/spark-operator/spark:v3.0.0"
          imagePullPolicy: Always
          mainClass: org.apache.spark.examples.SparkPi
          mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0.jar"
          sparkVersion: "3.0.0"
          restartPolicy:
            type: Never
          driver:
            memory: "512m"
            labels:
              version: 3.0.0
            serviceAccount: my-release-spark
          executor:
            instances: 1
            memory: "512m"
            labels:
              version: 3.0.0
  - name: sparkling-operator
    dag:
      tasks:
      - name: SparkPi1
        template: sparkpi

Logs from the workflow controller

time="2022-10-24T14:23:06.533Z" level=info msg="Processing workflow" namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:23:06.540Z" level=info msg="Updated phase  -> Running" namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:23:06.540Z" level=info msg="DAG node spark-kubernetes-dagbkjgn initialized Running" namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:23:06.540Z" level=info msg="All of node spark-kubernetes-dagbkjgn.SparkPi1 dependencies [] completed" namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:23:06.540Z" level=info msg="Pod node spark-kubernetes-dagbkjgn-3694106157 initialized Pending" namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:23:06.601Z" level=info msg="Created pod: spark-kubernetes-dagbkjgn.SparkPi1 (spark-kubernetes-dagbkjgn-sparkpi-3694106157)" namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:23:06.602Z" level=info msg="TaskSet Reconciliation" namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:23:06.602Z" level=info msg=reconcileAgentPod namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:23:06.616Z" level=info msg="Workflow update successful" namespace=freia phase=Running resourceVersion=27138244 workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:23:16.603Z" level=info msg="Processing workflow" namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:23:16.603Z" level=info msg="Task-result reconciliation" namespace=freia numObjs=0 workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:23:16.603Z" level=info msg="node changed" namespace=freia new.message= new.phase=Running new.progress=0/1 nodeID=spark-kubernetes-dagbkjgn-3694106157 old.message= old.phase=Pending old.progress=0/1 workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:23:16.604Z" level=info msg="TaskSet Reconciliation" namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:23:16.604Z" level=info msg=reconcileAgentPod namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:23:16.634Z" level=info msg="Workflow update successful" namespace=freia phase=Running resourceVersion=27138341 workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:23:26.635Z" level=info msg="Processing workflow" namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:23:26.635Z" level=info msg="Task-result reconciliation" namespace=freia numObjs=0 workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:23:26.635Z" level=info msg="node unchanged" namespace=freia nodeID=spark-kubernetes-dagbkjgn-3694106157 workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:23:26.635Z" level=info msg="TaskSet Reconciliation" namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:23:26.635Z" level=info msg=reconcileAgentPod namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:27:02.049Z" level=info msg="Processing workflow" namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:27:02.049Z" level=info msg="Task-result reconciliation" namespace=freia numObjs=0 workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:27:02.049Z" level=info msg="node changed" namespace=freia new.message= new.phase=Succeeded new.progress=0/1 nodeID=spark-kubernetes-dagbkjgn-3694106157 old.message= old.phase=Running old.progress=0/1 workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:27:02.049Z" level=info msg="Outbound nodes of spark-kubernetes-dagbkjgn set to [spark-kubernetes-dagbkjgn-3694106157]" namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:27:02.049Z" level=info msg="node spark-kubernetes-dagbkjgn phase Running -> Succeeded" namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:27:02.049Z" level=info msg="node spark-kubernetes-dagbkjgn finished: 2022-10-24 14:27:02.049826737 +0000 UTC" namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:27:02.049Z" level=info msg="Checking daemoned children of spark-kubernetes-dagbkjgn" namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:27:02.049Z" level=info msg="TaskSet Reconciliation" namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:27:02.049Z" level=info msg=reconcileAgentPod namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:27:02.049Z" level=info msg="Updated phase Running -> Succeeded" namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:27:02.049Z" level=info msg="Marking workflow completed" namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:27:02.049Z" level=info msg="Checking daemoned children of " namespace=freia workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:27:02.055Z" level=info msg="cleaning up pod" action=deletePod key=freia/spark-kubernetes-dagbkjgn-1340600742-agent/deletePod
time="2022-10-24T14:27:02.067Z" level=info msg="Workflow update successful" namespace=freia phase=Succeeded resourceVersion=27139900 workflow=spark-kubernetes-dagbkjgn
time="2022-10-24T14:27:02.078Z" level=info msg="cleaning up pod" action=labelPodCompleted key=freia/spark-kubernetes-dagbkjgn-sparkpi-3694106157/labelPodCompleted

Logs from in your workflow's wait container

No resources found in argo namespace.

@ajkaanbal
Copy link

Your resource needs some metadata, take a look at the example:

https://github.com/argoproj/argo-workflows/blob/master/examples/k8s-resource-log-selector.yaml

@Freia3
Copy link
Author

Freia3 commented Oct 24, 2022

@ajkaanbal This is for pulling the logs from the pods created by the spark CRD (spark-driver, spark-executor)
In the Argo UI I see these logs:
image
I want to be able to archive those logs.

@sarabala1979
Copy link
Member

@Freia3 Current Resource template will not support archiving the log. Do you like to work on this enhancement?

@Freia3
Copy link
Author

Freia3 commented Oct 31, 2022

@sarabala1979 Ok, thanks for the information, couldn't find this in the docs. No, I can't work on this enhancement.

@arnoin
Copy link

arnoin commented Nov 15, 2023

@sarabala1979 Hello I wish to contribute on this one, It is relevant for my team and I believe that the fix is straightforward.

From what Ive seen there are to ways to solve it,

  1. Make the argoexec resource store the logs as artifact using executor.WorkflowExecutor.SaveLogs - I still need to check if it will be able to store its own container logs while still running
  2. Initiate wait container for resource pods which by design stores logs of the main container here - I worry about redundant error reporting but I think it is safe

tbh I think 2 is a better option, wdyt?

@Joibel
Copy link
Member

Joibel commented Nov 22, 2023

I'm probably in agreement about 2 being the right way to do it. @sarabala1979, can you pitch in?

@arnoin
Copy link

arnoin commented Dec 31, 2023

Hey @sarabala1979 do you want me to create pull request?

@agilgur5 agilgur5 changed the title Logs argo workflow spark application don't get archived in artifact repo Resource template logs for spark application don't get archived in artifact repo Apr 29, 2024
@agilgur5 agilgur5 added the area/archive-logs Archive Logs feature label Apr 29, 2024
@roofurmston
Copy link
Contributor

Hi

We would be interested in this functionality too. I wonder whether there are any updates on it?

Also, to clarify, we would in interested in getting the logs for containers other than main, right? The logs for main are not particularly useful for ResourceTemplate resources, so we have a sidecar to surface the logs from the external resource. We would be looking to get these sidecar logs archived too.

shuangkun added a commit to shuangkun/argo-workflows that referenced this issue Nov 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants