Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Certain nullable type resolved to not null coder after #32705 #33132

Open
1 of 17 tasks
Abacn opened this issue Nov 15, 2024 · 6 comments
Open
1 of 17 tasks

[Bug]: Certain nullable type resolved to not null coder after #32705 #33132

Abacn opened this issue Nov 15, 2024 · 6 comments
Assignees

Comments

@Abacn
Copy link
Contributor

Abacn commented Nov 15, 2024

What happened?

Caught by DataflowTemplates validation for Beam 2.61.0rc1: https://github.com/GoogleCloudPlatform/DataflowTemplates/pull/2014/checks?check_run_id=33057418339

Was able to bisect:

good: 20d0f6e

bad: c243491

To reproduce,

  1. checkout Validate Beam 2.61.0rc1 GoogleCloudPlatform/DataflowTemplates#2014

  2. on beam repo, after making any change, run

On Beam repo, after making any change (e.g. add log)

git cherry-pick c53a2017ba6e22816c8346fc8ce50142d30e3f81

./gradlew :sdks:java:core:publishToMavenLocal -Ppublishing

./gradlew :runners:direct-java:publishToMavenLocal -Ppublishing
  1. On DataflowTemplate repo, run

mvn test -pl v2/dataplex -Dtest=DataplexBigQueryToGcsTest#testE2E_mainPathWithAllStepsEnabled

Issue Priority

Priority: 1 (data loss / total loss of function)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@Abacn
Copy link
Contributor Author

Abacn commented Nov 15, 2024

A simple test: succeeded prior to the change, failing after the change:

package org.apache.beam.runners.direct;

import com.google.auto.value.AutoValue;
import java.io.Serializable;
import org.apache.beam.sdk.coders.DefaultCoder;
import org.apache.beam.sdk.schemas.AutoValueSchema;
import org.apache.beam.sdk.schemas.SchemaCoder;
import org.apache.beam.sdk.schemas.annotations.DefaultSchema;
import org.apache.beam.sdk.testing.TestPipeline;
import org.apache.beam.sdk.transforms.Create;
import org.apache.beam.sdk.transforms.MapElements;
import org.apache.beam.sdk.transforms.SerializableFunction;
import org.apache.beam.sdk.values.KV;
import org.apache.beam.sdk.values.TypeDescriptor;
import org.apache.beam.sdk.values.TypeDescriptors;
import org.checkerframework.checker.nullness.qual.Nullable;
import org.junit.Rule;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.JUnit4;

@RunWith(JUnit4.class)
public class NullableEvaluationTest {
  @Rule
  public TestPipeline p = TestPipeline.create();

  @Test
  public void testConstruction() {
    // Either annotated with @Nullable or not does not matter
    p.apply(Create.of(KV.<String, @Nullable BigQueryTablePartition>of("123", null)))
        .apply("MapFileNames", MapElements.into(TypeDescriptors.kvs(
                TypeDescriptor.of(String.class),
                TypeDescriptor.of(BigQueryTablePartition.class)))
            .via(kv -> kv));
    p.run();
  }

  /** BigQuery table partition metadata. */
  @AutoValue
  @DefaultCoder(SchemaCoder.class)
  @DefaultSchema(AutoValueSchema.class)
  public abstract static class BigQueryTablePartition implements Serializable {

    public abstract String getPartitionName();

    public static Builder builder() {
      return new AutoValue_NullableEvaluationTest_BigQueryTablePartition.Builder();
    }

    /** Builder for {@link BigQueryTablePartition}. */
    @AutoValue.Builder
    public abstract static class Builder {
      public abstract Builder setPartitionName(String value);

      public abstract BigQueryTablePartition build();
    }
  }
}

@damccorm
Copy link
Contributor

This has been fixed via #33133 - leaving open since we had to revert the PR from @reuvenlax and we probably want the right fix going forward here

@damccorm damccorm removed this from the 2.61.0 Release milestone Nov 18, 2024
@damccorm damccorm added P2 and removed P1 labels Nov 19, 2024
@reuvenlax
Copy link
Contributor

Can you explain what the bug was (other than some tests started failing)? It's not clear to me from this description.

@Abacn
Copy link
Contributor Author

Abacn commented Nov 20, 2024

The bug was that it used to be able to encode elements of KV<KeyType, @Nullable ValueType> where ValueType uses a SchemaCoder, but now it cannot.

@reuvenlax
Copy link
Contributor

reuvenlax commented Nov 21, 2024 via email

@Abacn
Copy link
Contributor Author

Abacn commented Nov 25, 2024

I see. So the expected result would be that ValueType should be NullableCoder?

On Wed, Nov 20, 2024 at 7:46 AM Yi Hu @.***> wrote: The bug was that it used to be able to encode elements of KV<KeyType, @nullable ValueType> where ValueType uses a SchemaCoder, but now it cannot.

Yeah this would be a solution. I was wondering how it worked before the change. Did it use SerializableCoder?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants