0

I've been trying to get this operator working for some time since switching to airflow 2.0 BigQueryInsertJobOperator.

The error I'm seeing shows there is something missing from our connection, oddly enough this connection works in another DAG where we are using google's api to access google sheets:

    export AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT=
    "google-cloud-platform://?extra__google_cloud_platform__project=\analytics&extra__google_cloud_platform__keyfile_dict=
    {\"type\": \"service_account\", \"project_id\": \"analytics\", 
    \"private_key_id\": \"${GCLOUD_PRIVATE_KEY_ID}\", \"private_key\": \"${GCLOUD_PRIVATE_KEY}\", 
    \"client_email\": \"d@lytics.iam.gserviceaccount.com\", \"client_id\": \"12345667\",
    \"auth_uri\": \"https://accounts.google.com/o/oauth2/auth\", 
     \"token_uri\": \"https://accounts.google.com/o/oauth2/token\",
     \"auth_provider_x509_cert_url\": \"https://www.googleapis.com/oauth2/v1/certs\",
 \"client_x509_cert_url\": \"https://www.googleapis.com/robot/v1/metadata/x509/d@lytics.iam.gserviceaccount.com\"}"

This is the error I'm seeing:

{
  "error": {
    "code": 401,
    "message": "Request is missing required authentication credential. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.",
    "errors": [
      {
        "message": "Login Required.",
        "domain": "global",
        "reason": "required",
        "location": "Authorization",
        "locationType": "header"
      }
    ],
    "status": "UNAUTHENTICATED"
  }
}

is there a way I can look up what else might be required in terms of formatting, etc, perhaps a really good example on how to get the correct connection setup for this Operator??

In my logs I'm seeing this error which makes me think perhaps it might not be a credential issue?

  File "/usr/local/lib/python3.8/site-packages/google/cloud/_http.py", line 438, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.BadRequest: 400 POST https://bigquery.googleapis.com/bigquery/v2/projects/vice-analytics/jobs?prettyPrint=false: Required parameter is missing
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
KristiLuna
  • 1,601
  • 2
  • 18
  • 52
  • `BigQueryInsertJobOperator` uses the same auth mechanism as other Google operators, that is `GoogleBaseHook._get_credentials`. Have you tried to run this outside Airflow to validate if this is not issue fo credentials/BQ? – Tomasz Urbaszek May 30 '21 at 10:29
  • @TomaszUrbaszek thanks for responding! I'm actually starting to think its not the authorization. The actual message I'm seeing my logs just says "required parameter is missing" but its not specifying which parameter. – KristiLuna Jun 02 '21 at 15:20
  • Please take a look at this thread: https://stackoverflow.com/questions/46545966/request-is-missing-required-authentication-credential-expected-oauth-2-access-t/46566357 – Tomasz Urbaszek Jun 03 '21 at 09:39

1 Answers1

0

Create a service account json key, which contains all the required info posted in your error message.

https://cloud.google.com/iam/docs/creating-managing-service-account-keys

Then you can paste the json key into the Airflow UI: Admin -> Connections in the json key field and reference this in your dag with: gcp_conn_id="name of connection you created"

Or add the json key as an env variable (on macos): export GOOGLE_APPLICATION_CREDENTIALS="link to your json key file"

Dan
  • 77
  • 1
  • 10