I have two DB2 tables, expenses and environments, which I need to join.
expenses
ID    Expense  Job
1     10       AAAAAA
2     5        BBBBBB 
3     3        AAAAAAC
4     7        AAAAAA01
environments
Job        Environment
AAAAAA01   PROD
BBBBBB     INT
AAAAAAC    PROD
I now want to join the environments table to the expenses table so that I know in which environments the expenses occurred. The problem with the tables is that the Job column for both tables is slightly different. Hence, I can't just join ON ENVIRONMENTS.JOB = EXPENSES.JOB The values are between 6 and 8 characters long, and the values in the expenses table are often shorter than in the environments table. 
The logic for comparing the values of the Job columns would be something like this:
- Check for an exact match of the Jobif it is 8 characters long.
- If no match is found, remove one character and try again.
- Repeat until the Jobis six characters long.
In each step, if multiple values are found, select the first match.
I tried the following query
SELECT E.expense, ENV.environment
FROM EXPENSES E
LEFT OUTER JOIN ENVIRONMENTS ENV
ON LEFT(ENV.JOB, 6) = LEFT(E.JOB, 6)
GROUP BY E.expense, ENV.environment
The problem is that I receive duplicate values in the result. Hence, I get more expenses when grouping them by environment than I originally had. 
Actual output:
ID    Expense  Job        Job         Environment
1     10       AAAAAA     AAAAAA01    PROD
1     10       AAAAAA     AAAAAAC     PROD
2     5        BBBBBB     BBBBBB      INT
3     3        AAAAAAC    AAAAAA01    PROD
3     3        AAAAAAC    AAAAAAC     PROD
4     7        AAAAAA01   AAAAAA01    PROD
4     7        AAAAAA01   AAAAAAC     PROD
Desired output:
ID    Expense  Job        Job         Environment
1     10       AAAAAA     AAAAAA01    PROD
2     5        BBBBBB     BBBBBB      INT
3     3        AAAAAAC    AAAAAA01    PROD
4     7        AAAAAA01   AAAAAA01    PROD
The value in the second Job column could be either AAAAAA01 or AAAAAAC. This is completely irrelevant.
This is due to the `ON` condition. Let's consider the following example. The row with `ID=1` from the `expenses` table matches two rows in the `environments` table. Hence, the row with `ID=1` occurs twice in the result. Importantly, the matching rows from the `environments` table all have the same environment. Consequently, it is negligible which row I select.
How can I avoid duplicate values in the given scenario?
I have already tried the following solutions: 
 
    