I am processing a fairly big collection of Tweets and I'd like to obtain, for each tweet, its mentions (other user's names, prefixed with an @), if the mentioned user is also in the file:
users = new Dictionary()
for each line in file:
   username = get_username(line)
   userid   = get_userid(line)
   users.add(key = userid, value = username)
for each line in file:
   mentioned_names = get_mentioned_names(line)
   mentioned_ids = mentioned_names.map(x => if x in users: users[x] else null)
   print "$line | $mentioned_ids"
I was already processing the file with GAWK, so instead of processing it again in Python or C I decided to try and add this to my AWK script. However, I can't find a way to make to passes over the same file, executing different code for each one. Most solutions imply calling AWK several times, but then I'd loose the associative array I made in the first pass.
I could do it in very hacky ways (like cat'ing the file twice, passing it through sed to add a different prefix to all the lines in each cat), but I'd like to be able to understand this code in a couple of months without hating myself.
What would be the AWK way to do this?
PD:
The less terrible way I've found:
function rewind(    i)
{
    # from https://www.gnu.org/software/gawk/manual/html_node/Rewind-Function.html
    # shift remaining arguments up
    for (i = ARGC; i > ARGIND; i--)
        ARGV[i] = ARGV[i-1]
    # make sure gawk knows to keep going
    ARGC++
    # make current file next to get done
    ARGV[ARGIND+1] = FILENAME
    # do it
    nextfile
}
BEGIN {
 count = 1;
}
count == 1 {
 # first pass, fills an associative array
}
count == 2 {
 # second pass, uses the array
}
FNR == 30 { 
   # handcoded length, horrible
   # could also be automated calling wc -l, passing as parameter
  if (count == 1) {
        count = 2;
        rewind(1)
    }
}
 
     
    