12

Whenever I use grep with gnuwin32's recurse option -r and include a glob pattern for files to search (e.g. *.c), no files in the subdirectories are searched. I am using the latest grep from gnuwin32.

Specifically, I was searching for the string "iflag" in all my c source files in a directory.

grep -r iflag *.c
Eric
  • 1,343

4 Answers4

11

Grep's -r option (which is the same as the -R, --recursive, -d recurse and --directories=recurse options) takes a directory name (or pattern) as its argument. The command you are trying to execute should be interpreted as "Starting in the current working directory recurse all directories matching the pattern *.c. In each of those directories search all files for the string iflag."

The grep command the OP probably wanted is

grep --include \*.c -r iflag .

which is interpreted as, "Starting in the current working directory recursively search all subdirectories. In each subdirectory search all files matching the pattern *.c for the string iflag."

HairOfTheDog
  • 2,482
8

I'm not sure why the recurse flag doesn't work, but here's a workaround that works for me. The -r option takes an argument: the directory to search. To search the current directory, give it the argument .. For example

grep regexp-to-find -r . --include=*.c

Edit

This is actually the expected behavior of grep, and has nothing to do with running it on Windows. The -r option takes a directory argument. Check out HairOfTheDog's answer for why.

2

I find the answers given so far way too complicated. Just use:

grep -r --include="*.c" searchString .

(as proposed by christangrant on StackOverflow or by HairOfTheDog in the comments above.)

If you are too lazy to type that all the time, just define a function and add it to "~/.bashrc". (A normal alias is not possible since parameters are used, as explained on StackOverflow)

rgrep() {
  grep -r --include="$2" "$1" .
}

Now you have an easy to use recursive grep. E.g., if you want to search for "string" in all text files, use:

rgrep string "*.txt"
1

Solution

For what you want to do, using --include is the right way; credit goes to other answers.


Rationale behind this answer

No answer posted so far has explained why your original code does not work like you expected. This is the gap my answer tries to fill.


Why your code does not work like you expected

Your original code is:

grep -r iflag *.c

and you expected grep to recursively (because of -r) examine the current working directory and consider only files matching the pattern *.c.

To understand what happens, it's good to analyze first how this command behaves in Linux. You mentioned GnuWin, your OS is Windows, you probably used grep inside cmd.exe; this brings some interesting nuances (we will get to them), but still your grep is designed to mimic GNU grep in Unix/Linux.

So let's assume for a while you run the command in Linux. In Linux it works similarly, i.e. it does not give you what you expected either.

The first important thing that happens in Linux is the shell tries to expand *.c before it starts grep. *.c is expanded to possibly multiple arguments: names of matching files in the current working directory. E.g. if there are two matching files foo.c and bar.c then your command will effectively be:

grep -r iflag bar.c foo.c

Two important facts:

  1. The above grep is not aware *.c was used. It sees -r, iflag, bar.c, foo.c as its arguments.
  2. If foo.c and bar.c are regular files then -r is irrelevant. OTOH if e.g. foo.c is a file of the type directory then -r will make grep descend into the directory; but grep will read all files in the directory (it is still not aware of *.c you typed).

In some shells you can use a glob that can match files in subdirectories. In your case it would be like **/*.c. This feature is called "globstar" and one may need to explicitly enable it in a shell. If you properly used such pattern, the shell would pass all matching paths to grep (e.g. dir1/subdir/baz.c), grep would not see the pattern itself, -r might be irrelevant. The job of picking all pathnames matching the pattern would be done by the shell. Your pattern is not like this though.

There are two scenarios where grep actually sees this *.c:

  1. If there is no match in the current working directory then (in some shells) *.c will not be expanded and it will be passed to grep literally as *.c.
  2. If you escape or quote the asterisk in *.c (examples: \*.c, "*.c", '*'.c) then it will not be expanded and the word will be passed to grep literally as *.c.

This cannot help you because grep itself is not designed to expand patterns in an argument it interprets as pathname. If it sees *.c where it expects a pathname then it will try to read a file named *.c. There are tools that actually interpret such patterns in some circumstances (in fact grep --include is an example).

Even if grep was designed to interpret this *.c in your code as a pattern in the way you expected, in many cases your code wouldn't work. You would need to quote or escape the pattern to protect it from the shell (see this question).

In short, the reason your code doesn't work is *.c is expanded first (before grep runs), recursiveness starts later (when grep runs). You want the other way around and one method to do it is with --include=*.c (the other is with globstar where the shell handles the recursiveness, but it can give you argument list too long error; so it's better to let grep do this).

Note --include=*.c is still a pattern your shell may expand; if you want your code to be robust then you should escape or quote the asterisk. It's good to know what parts of your command (in general: any command) are (or may be) expanded by the (or a) shell.


Nuances of Windows

And now, finally, the nuances of Windows. Your grep in Windows is from the GnuWin project, it is designed to reasonably mimic the GNU grep you can find in Linux. We saw that in Linux your command results in *.c being expanded first, recursiveness starting later. In Unix/Linux grep cannot do anything about it because the tool starts after *.c is expanded by the shell. In Windows it's different.

If grep (grep.exe) is actually an executable in Windows and you run it in cmd.exe as

grep -r iflag *.c

(like you probably did) then the executable will (be able to) see *.c before it gets expanded. In Unix/Linux grep (any executable in general) gets an array of arguments after a shell "digests" the command presented as a string. In Windows grep.exe (any executable in general) gets a string (a somewhat "digested" string, still a string), not an array; it creates an array for itself. In your case the string contains *.c and the expansion is done after grep starts, by grep; not before, not by cmd.exe. Like many tools in Windows grep probably doesn't bother to provide a custom function to get from a string to an array of arguments, some standard function provided by Windows is used, so it looks like cmd.exe does the job. Even the creators of your grep.exe pretend *.c in Windows is expanded by the shell; here they say "filename wildcards are interpreted by the command shell, not by the program". This is not entirely true. See the second part of this answer of mine for some more details.

All this means if the creators of grep.exe wanted your exact command (with unquoted and unescaped asterisk) to behave like you expected (recursivity first, pattern matching later), then in Windows they could do this; in Linux they could not, not reliably, your shell would interfere unless you protected the asterisk from being expanded. Note an unquoted/unprotected *.c not being expanded by the shell is a non-standard behavior, even in Windows (where we should rather say "*.c not being expanded as if by the shell"). So even if everyone wanted your exact command (with unquoted and unescaped asterisk) to behave exactly like you expected (without interference from a shell, ever), then it wouldn't be implemented in GNU grep or GnuWin grep.exe anyway because it would be impossible in Unix/Linux and against the established standard in Unix/Linux and Windows.

It doesn't matter though, because the current behavior of grep (I mean interpreting filenames not as patterns, even if they look like patterns, even if -r is used) is fine and there is no need to change it.

Also note that the rules of quoting in Windows are different than in shells in Unix/Linux. grep.exe, when it gets a string, before the string is converted to an array of arguments, can see quotes used. By providing a custom function to convert from a string to an array, grep.exe could impose different rules of quoting, at least to some degree, possibly more Unix-like rules. The fact it doesn't impose such rules is yet another clue a standard function is used. It looks to me your grep.exe in Windows does in the most Windows-like way what a shell in Linux is responsible for (even though in Windows it has means to change some functionalities to Linux-like); and it does in the most grep-like way what GNU grep in Linux is responsible for.