Is there a reason why there are two different commands to generate a new variable?
Is there a simple way to remember when to use gen and when to use egen?
They both create a new variable, but work with different sets of functions. You will typically use gen when you have simple transformations of other variables in your dataset like
gen newvar = oldvar1^2 * oldvar2
In my workflow, egen usually appears when I need functions that work across all observations, like in
egen max_var = max(var)
or more complex instructions
egen newvar = rowmax(oldvar1 oldvar2)
to calculate the maximum for each observation between oldvar1 and oldvar2. I don't think there is a clear logic for separating the two commands.
gengenerate may be abbreviated by gen or even g and can be used with the following mathematical operators and functions:
+ addition- subtraction* multiplication / division ^ powerA large number of functions is available. Here are some examples:
abs(x) absolute value of xexp(x) antilog of xint(x) or trunc(x) truncation to integer valueln(x), log(x) natural logarithm of xround(x) rounds to the nearest integer of xround(x,y) x rounded in units of y (i.e., round(x,.1) rounds to one decimal place)sqrt(x)square root of xruniform() returns uniformly distributed numbers between 0 and nearly 1rnormal() returns numbers that follow a standard normal distributionrnormal(x,y) returns numbers that follow a normal distribution with a mean of x and a s.d. of yegenA number of more complex possibilities have been implemented in the egen command like in the following examples:
egen nkids = anycount(pers1 pers2 pers3 pers4 pers5), value(1)egen v323r = rank(v323)egen myindex = rowmean(var15 var17 var18 var20 var23)egen nmiss = rowmiss(x1-x10 var15-var23)egen nmiss = rowtotal(x1-x10 var15-var23)egen incomst = std(income)bysort v3: egen mincome = mean(income)Detailed usage explanations can be found at this link.