I am trying to grep the parts of an html form, specifically the action part i.e. <form action = ….
I originally tried:
grep -E -e 'form\s*action\s*=.*[.]html' ./*
but it did not work (despite the fact that there are such strings.
Then I tried the basic:  grep -E -e 'form\s*action\s*=' ./* but this did not work either!
What am I doing wrong?
            Asked
            
        
        
            Active
            
        
            Viewed 263 times
        
    0
            
            
        
        Jim
        
- 18,826
 - 34
 - 135
 - 254
 
- 
                    For the `./*` part, you mean current directory? And what is the error message? – Juto Aug 28 '13 at 11:32
 - 
                    So you want to match the content of the action or you want only ` – Ibrahim Najjar Aug 28 '13 at 11:32
 - 
                    `xs` is a file. Like html – Jim Aug 28 '13 at 11:43
 - 
                    @nl-x Don't escape the backslash, it's only for grep and well protected with single quotes. Else it will match a backslash followed with a 's'. – Bentoy13 Aug 28 '13 at 11:45
 
2 Answers
1
            
            
        This wont get you the action. It will get you the part just before the action. For example if you have <form id="myForm" action="myFile.php">the regexp will just get you form id="myForm" action=
So try in stead:
grep -E -o -i -e '<form\s+[^>]*action\s*=[^>]*>' ./*
[^>]* means everything except >, zero or more times.
-o means only get the matching part
-i means case insensitive
        nl-x
        
- 11,762
 - 7
 - 33
 - 61
 
- 
                    Even this `grep -E -e 'form\\s*action\\s*=' ./*` does not give any results or `grep -E -e '\ – Jim Aug 28 '13 at 11:38
 - 
                    @Jim I was wrong about double escaping. I edited my answer. I tested it, and it works – nl-x Aug 28 '13 at 11:43
 - 
                    
 - 
                    
 - 
                    I would add a `\b` after `form`, or replace the first `\s*` with `\s+`, else you can match `
`. Maybe it's unnecessary... Else, works fine for me too, +1. – Bentoy13 Aug 28 '13 at 11:57 - 
                    @Jim -r is recursive. You didn't mention you wanted to go recursive. My test was on a single file – nl-x Aug 28 '13 at 11:59
 - 
                    
 - 
                    All I couldn't manage yet is to only return the contents of the action attribute. I can only return the entire regexp match. I tried making a group with parenthesis `(...)` , but GREP just ignores that... Maybe I should name the group, or add some extra switch – nl-x Aug 28 '13 at 12:02
 - 
                    @nl-x I think that you can achieve returning only the action part using lookbehind and adding option `-P` for that. It's up to you! – Bentoy13 Aug 28 '13 at 12:08
 - 
                    According to http://stackoverflow.com/a/1891890/1209443 I can use Grep multiple times by piping it. I can see how that would work... – nl-x Aug 28 '13 at 12:19
 - 
                    @nl-x Yeah, with grep, only way to do that because you cannot specify variable-length pattern into lookbehind. So yes, pipe seems to be the only way! – Bentoy13 Aug 28 '13 at 12:29
 
0
            
            
        Why not use a html parser/xpath implementation? Like my Xidel:
This returns the url in the action part:
xidel ./* -e //form/@action
Or with pattern matching, instead xpath:
xidel ./* -e '<form action="{.}"/>*'
You can even do all further processing in it. E.g. to not only get the action, but also the values of all input-elements url-encoded you can use:
xidel ./* -e //form/form(.)
        BeniBela
        
- 16,412
 - 4
 - 45
 - 52