Grepping for the form action= part of html pages

Question

I am trying to grep the parts of an html form, specifically the action part i.e. <form action = ….
I originally tried:
grep -E -e 'form\s*action\s*=.*[.]html' ./*
but it did not work (despite the fact that there are such strings.
Then I tried the basic: grep -E -e 'form\s*action\s*=' ./* but this did not work either!
What am I doing wrong?

For the `./*` part, you mean current directory? And what is the error message? — Juto, Aug 28 '13 at 11:32
So you want to match the content of the action or you want only ` — Ibrahim Najjar, Aug 28 '13 at 11:32
@nl-x Don't escape the backslash, it's only for grep and well protected with single quotes. Else it will match a backslash followed with a 's'. — Bentoy13, Aug 28 '13 at 11:45

nl-x · Answer 1 · 2013-08-28T12:01:10.290

1

This wont get you the action. It will get you the part just before the action. For example if you have <form id="myForm" action="myFile.php">the regexp will just get you form id="myForm" action=

So try in stead:

grep -E -o -i -e '<form\s+[^>]*action\s*=[^>]*>' ./*

[^>]* means everything except >, zero or more times.
-o means only get the matching part
-i means case insensitive

edited Aug 28 '13 at 12:01

answered Aug 28 '13 at 11:36

nl-x

11,762
7
33
61

Even this `grep -E -e 'form\\s*action\\s*=' ./*` does not give any results or `grep -E -e '\
' ./*` does not work either
– Jim Aug 28 '13 at 11:38
@Jim I was wrong about double escaping. I edited my answer. I tested it, and it works – nl-x Aug 28 '13 at 11:43
`grep -E -e '
]*action\\s*=[^>]*>' ./*` does not work either
– Jim Aug 28 '13 at 11:44
It needs `-r`! Your test was in the current directory probably! – Jim Aug 28 '13 at 11:57
I would add a `\b` after `form`, or replace the first `\s*` with `\s+`, else you can match ``. Maybe it's unnecessary... Else, works fine for me too, +1. – Bentoy13 Aug 28 '13 at 11:57
@Jim -r is recursive. You didn't mention you wanted to go recursive. My test was on a single file – nl-x Aug 28 '13 at 11:59
@Bentoy13 Tnx. It was indeed supposed to be \s+ ... I'll edit it – nl-x Aug 28 '13 at 12:00
All I couldn't manage yet is to only return the contents of the action attribute. I can only return the entire regexp match. I tried making a group with parenthesis `(...)` , but GREP just ignores that... Maybe I should name the group, or add some extra switch – nl-x Aug 28 '13 at 12:02
@nl-x I think that you can achieve returning only the action part using lookbehind and adding option `-P` for that. It's up to you! – Bentoy13 Aug 28 '13 at 12:08
According to http://stackoverflow.com/a/1891890/1209443 I can use Grep multiple times by piping it. I can see how that would work... – nl-x Aug 28 '13 at 12:19
@nl-x Yeah, with grep, only way to do that because you cannot specify variable-length pattern into lookbehind. So yes, pipe seems to be the only way! – Bentoy13 Aug 28 '13 at 12:29

score 0 · Answer 2 · answered Aug 28 '13 at 13:03

Why not use a html parser/xpath implementation? Like my Xidel:

This returns the url in the action part:

xidel ./* -e //form/@action

Or with pattern matching, instead xpath:

xidel ./* -e '<form action="{.}"/>*'

You can even do all further processing in it. E.g. to not only get the action, but also the values of all input-elements url-encoded you can use:

xidel ./* -e //form/form(.)

Grepping for the form action= part of html pages

2 Answers2