Multiple Left Joins in BigQuery
Multiple Left Joins in BigQuery
I'm trying to make a currently working SQL query that I have in BigQuery more streamlines and am running into the following issue:
Error: ON clause must be AND of = comparisons of one field name from each table, with all field names prefixed with table name. Consider using Standard SQL .google.com/bigquery/docs/reference/standard-sql/), which allows non-equality JOINs and comparisons involving expressions and residual predicates.
Below is the query that is giving the error above. The first LEFT JOIN works. When I added the second one, right below, I started getting the error. What I'm trying to do is get the human readable own.o.firstname and own.o.lastname values rather than the owner_id value of the deal record (o.properties.hubspot_owner_id.value), but in order to do so I need to join some tables.
I had to use CAST on the ON clause of the second JOIN because the fields are of different types in each table's respective schema. If I don't do that, I get the following error: Error: Join keys o.properties.hubspot_owner_id.value (string) and o.ownerid (int64) have types that cannot be automatically coerced.
The WHERE clause is just a suppression list to not return entries that have been deleted from the database.
SELECT o.*
FROM (
SELECT
o.dealid,
o.properties.dealname.value,
stages.Label,
o.properties.closedate.value,
o.properties.hubspot_owner_id.value,
own.o.firstname,
own.o.lastname,
o.properties.amount.value,
o.properties.createdate.value,
o.properties.pipeline.value,
o.associations.associatedcompanyids,
ROW_NUMBER() OVER (PARTITION BY o.dealid ORDER BY o._sdc_batched_at DESC) as seqnum
FROM [sample-table:hubspot.deals] o
LEFT JOIN [sample-table:hubspot.sales_stages_lookup] stages ON o.properties.dealstage.value = stages.Internal_Value
LEFT JOIN [sample-table:hubspot.owners_reporting] own ON CAST(o.properties.hubspot_owner_id.value AS INTEGER) = CAST(own.o.ownerid AS INTEGER)) o
WHERE o.dealid NOT IN (SELECT objectid FROM [sample-table:hubspot_suppression_list.data] WHERE subscriptiontype = 'deal.deletion') AND seqnum = 1
1 Answer
1
Use standard SQL in BigQuery instead, which supports expressions as part of the ON
clause:
ON
#standardSQL
SELECT o.*
FROM (
SELECT
o.dealid,
o.properties.dealname.value AS dealname_value,
stages.Label,
o.properties.closedate.value AS closedate_value,
o.properties.hubspot_owner_id.value AS hubspot_owner_id_value,
own.o.firstname,
own.o.lastname,
o.properties.amount.value AS amount_value,
o.properties.createdate.value AS createdate_value,
o.properties.pipeline.value AS pipeline_value,
o.associations.associatedcompanyids,
ROW_NUMBER() OVER (PARTITION BY o.dealid ORDER BY o._sdc_batched_at DESC) as seqnum
FROM `sample-table.hubspot.deals` o
LEFT JOIN `sample-table.hubspot.sales_stages_lookup` stages ON o.properties.dealstage.value = stages.Internal_Value
LEFT JOIN `sample-table.hubspot.owners_reporting` own ON CAST(o.properties.hubspot_owner_id.value AS INT64) = CAST(own.o.ownerid AS INT64)) o
WHERE o.dealid NOT IN (SELECT objectid FROM `sample-table.hubspot_suppression_list.data` WHERE subscriptiontype = 'deal.deletion') AND seqnum = 1
For more on the differences between legacy and standard SQL in BigQuery, see the migration guide.
Yes. Using standard SQL is recommended instead since it doesn't have this or other limitations.
– Elliott Brossard
2 days ago
I just ran the query using standard SQL and got the following error...
Duplicate column names in the result are not supported. Found duplicate(s): value
. I've tried a bunch of modifications to the query, but I still can't figure it out.– Felipe Da Cruz
2 days ago
Duplicate column names in the result are not supported. Found duplicate(s): value
See if this edit helps. The problem is that the query produces multiple columns in the result named
value
, but you can give them different names to resolve the error.– Elliott Brossard
2 days ago
value
That did it! I also realized that the inner query was returning a lot of columns named
value
and ended up using aliases for each of them via AS
. It fixed it and ended up making the column names more meaningful anyway. Thank you for all of the help - it is sincerely appreciated!– Felipe Da Cruz
2 days ago
value
AS
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
thank you for the answer. Is the expression what the problem is in the query?
– Felipe Da Cruz
2 days ago