Datastage incremental charging

Question

We would like to perform incremental loading in DataStage (in parallel environement). Exactly load only the delta between the previous load and the new one (for create, update, delete the records in DWH).
We would like to store the last key recovered during the previous load to be able to restart the request from the next record on a new loading.
We have already successfully used a parameter to filter the SQL load query at runtime. Unfortunately, we have not yet found the possibility to retrieve the last key (max (Key) - Aggregator?) And to store it in this parameter.
Which stage to use, to output a single value in the same parallel job, and then store to parameter ?

Any ideas ?

Thanks for your help.

score 0 · Accepted Answer · answered Aug 27 '18 at 17:58

0

Think about getting the max value from your target - it is most probably a database and a max() is easy to do.

Check out my post about getting some data from the "stream" to a parameter

answered Aug 27 '18 at 17:58

MichaelTiefenbacher

3,805
2
11
17

Thank's Michael. Your post about stream to a parameter is very usefull. To go further, how do you generalize this to several tables, knowing that these values also depend on the environment (test / production)? Do you have a parameter file per tables and per environment? Is it possible to globalize this in DB? Regarding the max, does this mean that I have to go through another job that is linked through a sequence job? Can not we do everything in the same job? Thank you again for your lighting. – F. Copin Aug 28 '18 at 09:27
I strongy recommend ParameterSets for database access and you could also include something like mandator or whatever you need to database name, user, password etc. The "delta max value" does not depend on the environment as it would be collected sepaately (if you have different databases for your environemnts as well - which again would be recommended). One general (generic) job would do in my eyes (plus the steps to get it back as a parameter. Please do not forget to mark answers or upvote comments if you are satisfied - this is how this forum works - thanks – MichaelTiefenbacher Aug 28 '18 at 14:59

score 0 · Answer 2 · answered Aug 29 '18 at 14:25

0

Thank's Michael,

I've found Head stage to get the max(LastRowId) in the same job, with 'All rows (after skip) = False), and 'Number of Rows (Per partition)=1. And I run the job in sequential mode...

That's worked fine.

answered Aug 29 '18 at 14:25

F. Copin

3
2

Datastage incremental charging

2 Answers2