awk match whole word

Question

I want to use awk to match whole words from text file. Including words bounded by non alphanumeric characters.

For example -

string to search for - ABC

Source file -

HHHABCCCCH
HHH ABC
HH(ABC)ASDAASD
HH,ABC-ASASDASD

Result -

HHH ABC
HH(ABC)ASDAASD
HH,ABC-ASASDASD

glenn jackman · Accepted Answer · 2021-04-18T14:13:08.280

7

If you want to pass "ABC" as a variable instead of hardcoding it, use the matching operator:

awk -v word=ABC '$0 ~ "(^|[^[:alpha:]])" word "([^[:alpha:]]|$)"'

With gawk (other awks too?) you can use \< and \> to denote word boundaries, where a word is a sequence of letters, digits and underscore (I believe), so this will work for your example:

awk '/\<ABC\>/'

edited Apr 18 '21 at 14:13

answered Oct 21 '11 at 16:30

glenn jackman

27,524

score 6 · Answer 2 · answered Jul 05 '18 at 08:48

6

Use \y for word boundary, e.g.

awk '/\yABC\y/'

See https://www.gnu.org/software/gawk/manual/html_node/GNU-Regexp-Operators.html for more details.

answered Jul 05 '18 at 08:48

Pero

161

score 2 · Answer 3 · answered Oct 21 '11 at 16:14

2

Figured it out - was having problems due to a typo

awk '/[^[:alpha:]]ABC[^[:alpha:]]/'

answered Oct 21 '11 at 16:14

bryan

8,528
4
30
42

hafiz031 · Answer 4 · 2022-12-13T07:30:17.383

0

Try this:

awk '/( |\t|^|[^a-zA-Z0-9]+)ABC( |\t|$|[^a-zA-Z0-9]+)/' filename

Here: ( |\t|^|[^a-zA-Z0-9]+) means it can only have space/tab/non-alphanumeric character in before ABC or it is the beginning of the line.

Again,( |\t|$|[^a-zA-Z0-9]+) means it can only have space/tab/non-alphanumeric character in after ABC or it is the end of the line.

edited Dec 13 '22 at 07:30

answered Dec 13 '22 at 07:13

hafiz031

101

awk match whole word

4 Answers4

Linked