Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] GoogleCloudProfile dataset argument is not replaced by neither model or yml config, but instead used as a prefix #1334

Open
1 task done
vanAkim opened this issue Nov 20, 2024 · 2 comments
Labels
area:config Related to configuration, like YAML files, environment variables, or executer configuration bug Something isn't working execution:docker Related to Docker execution environment execution:virtualenv Related to Virtualenv execution environment profile:bigquery Related to BigQuery ProfileConfig triage-needed Items need to be reviewed / assigned to milestone

Comments

@vanAkim
Copy link

vanAkim commented Nov 20, 2024

Astronomer Cosmos Version

1.6.0

dbt-core version

1.8.8

Versions of dbt adapters

dbt-adapters 1.7.0
dbt-bigquery 1.8.3

LoadMode

AUTOMATIC

ExecutionMode

VIRTUALENV

InvocationMode

None

airflow version

2.5.3

Operating System

Ubuntu & Docker under WSL2 of windows 11

If a you think it's an UI issue, what browsers are you seeing the problem on?

No response

Deployment

Docker-Compose

Deployment details

No response

What happened?

As I'm using GoogleCloudOauthProfileMapping and passing the required arguments for profile_args, a project, my_gcp_project, and a dataset, my_gcp_dataset, are set.

By running the current set-up to build the simple my_first_dbt_model.sql of the jaffle_shop project, the table is correctly created in the correct location, my_gcp_project.my_gcp_dataset.my_first_dbt_model.

Now, if I try to overwrite these parameters for this specific model,

  • within {{ config(database='my_other_gcp_project', schema='my_other_gcp_dataset') }} model config, the new location will become my_other_gcp_project.my_gcp_dataset_my_other_gcp_dataset.my_first_dbt_model
  • within the yml config for the model, as below, also do the same location, my_other_gcp_project.my_gcp_dataset_my_other_gcp_dataset.my_first_dbt_model
    models:
      - name: my_first_dbt_model
        config:
          database: "my_other_gcp_project"
          schema: "my_other_gcp_dataset"

The word combination database/project and schema/dataset are interchangeable and working as is.

So the database/project is fully overwritten in the appropriate hierarchy (https://docs.getdbt.com/reference/model-configs#configuring-models) but schema/dataset are concatenated.
However, the concatenation occurs only with the Cosmos argument and the dbt upper hierarchical config. For example, if I try both options described above, only the model config() will be concatenated with Cosmos value.

Relevant log output

How to reproduce

  1. Use a GCP profile mapping function, in my case GoogleCloudOauthProfileMapping
  2. Set a dataset value
  3. Set a schema/dataset value for a specific model, either config() or yml config.

Anything else :)?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Contact Details

[email protected]

@vanAkim vanAkim added bug Something isn't working triage-needed Items need to be reviewed / assigned to milestone labels Nov 20, 2024
Copy link

dosubot bot commented Nov 20, 2024

Hey there, @vanAkim! I'm here to assist you with any bugs, questions, or contributions you might have. Let's work together to tackle this issue.

The issue you're experiencing seems to be related to how Cosmos handles schema/dataset configurations when using GoogleCloudOauthProfileMapping. Specifically, it appears that the schema/dataset values are being concatenated rather than replaced when you attempt to override them in a model-specific configuration.

In Cosmos, the GoogleCloudOauthProfileMapping extracts the project and dataset from the Airflow BigQuery connection's extra field, and these values are used in the dbt profile configuration [1]. However, when you try to override these values in a model-specific configuration, it seems that the schema/dataset values are concatenated instead of being replaced.

This behavior might be due to the way Cosmos merges configurations from different sources. The _config_selector_ooo method in the DbtModelConfig class is responsible for merging configurations, ensuring that SQL file configurations override those from properties.yml files [2]. However, it seems that this logic might not be correctly handling the replacement of schema/dataset values in your case.

To address this, you might need to look into how the _config_selector_ooo method is applied in your setup or consider submitting a pull request to adjust the logic for handling schema/dataset overrides.

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added area:config Related to configuration, like YAML files, environment variables, or executer configuration execution:docker Related to Docker execution environment execution:virtualenv Related to Virtualenv execution environment profile:bigquery Related to BigQuery ProfileConfig labels Nov 20, 2024
@pankajastro
Copy link
Contributor

Hey @vanAkim, Thank you for finding this issue and providing a detailed analysis. Could you please add a test here: https://github.com/astronomer/astronomer-cosmos/blob/main/tests/profiles/bigquery/test_bq_oauth.py for this case and propose the solution? We are planning to do the next release in mid-December.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:config Related to configuration, like YAML files, environment variables, or executer configuration bug Something isn't working execution:docker Related to Docker execution environment execution:virtualenv Related to Virtualenv execution environment profile:bigquery Related to BigQuery ProfileConfig triage-needed Items need to be reviewed / assigned to milestone
Projects
None yet
Development

No branches or pull requests

2 participants