I would like to be able to use a piped input or reference file of domains (file B) to remove each domain and it's subdomains from file A
I can't use grep "bbc.co.uk", for example, as this would include entries such as cbbc.co.uk.
I have tried to use a while read loop to iterate through file B, running grep -E "^([^.\s]+\.)*${escaped_domain}$" fileA to identify both domains and subdomains but this is very, very slow with the amount of comparisons required.
Is there a better way to do this? Perhaps using awk?
File B (or piped input)
~30k lines
bbc.co.uk
amazon.co.uk
doubleclick.net
File A
~150k+ lines
123123.test.bbc.co.uk
123434.rwr.amazon.co.uk
ads.bbc.co.uk
adsa.23432.doubleclick.net
amazon.co.uk
bbc.co.uk
cbbc.co.uk
damazon.co.uk
fsdfsfs.doubleclick.net
test.amazon.co.uk
test.bbc.co.uk
test.damazon.co.uk
Desired output:
cbbc.co.uk
damazon.co.uk
test.damazon.co.uk
Current method (different input with grep/regexps)
# Convert input: address=/test.com/ -> ^([^.\s]+\.)*test\.com$
regexList=$(cat fileB | 
    sed 's/\./\\./g' |
    awk -F '/' {'print "^([^.\s]+\.)*"$2"$"'})
while read -r regex; do
    grep -E $regex filaA
done <<< "$regexList"
 
     
    