6

I want to use awk to match whole words from text file. Including words bounded by non alphanumeric characters.

For example -

string to search for - ABC

Source file -

HHHABCCCCH
HHH ABC
HH(ABC)ASDAASD
HH,ABC-ASASDASD

Result -

HHH ABC
HH(ABC)ASDAASD
HH,ABC-ASASDASD
bryan
  • 8,528
  • 4
  • 30
  • 42

4 Answers4

7

If you want to pass "ABC" as a variable instead of hardcoding it, use the matching operator:

awk -v word=ABC '$0 ~ "(^|[^[:alpha:]])" word "([^[:alpha:]]|$)"'

With gawk (other awks too?) you can use \< and \> to denote word boundaries, where a word is a sequence of letters, digits and underscore (I believe), so this will work for your example:

awk '/\<ABC\>/'
glenn jackman
  • 27,524
6

Use \y for word boundary, e.g.

awk '/\yABC\y/'

See https://www.gnu.org/software/gawk/manual/html_node/GNU-Regexp-Operators.html for more details.

Pero
  • 161
2

Figured it out - was having problems due to a typo

awk '/[^[:alpha:]]ABC[^[:alpha:]]/'
bryan
  • 8,528
  • 4
  • 30
  • 42
0

Try this:

awk '/( |\t|^|[^a-zA-Z0-9]+)ABC( |\t|$|[^a-zA-Z0-9]+)/' filename

Here: ( |\t|^|[^a-zA-Z0-9]+) means it can only have space/tab/non-alphanumeric character in before ABC or it is the beginning of the line.

Again,( |\t|$|[^a-zA-Z0-9]+) means it can only have space/tab/non-alphanumeric character in after ABC or it is the end of the line.

hafiz031
  • 101