13

In a linux shell, I want to make sure that a certain set of files all begin with <?, having that exact string and no other characters at the beginning. How can I grep or use some other to express "file begins with"?


Edit: I'm wildcarding this, and head doesn't give a filename on the same line, so when I grep it, I don't see the filname. Also, "^<?" doesn't seem to give the right results; basically I'm getting this:

$> head -1 * | grep "^<?"
<?
<?
<?
<?
<?
...

All of the files are actually good.

user13743
  • 1,751

11 Answers11

13

In Bash:

for file in *; do [[ "$(head -1 "$file")" =~ ^\<\? ]] || echo "$file"; done

Make sure they are files:

for file in *; do [ -f "$file" ] || continue; [[ "$(head -1 "$file")" =~ ^\<\? ]] || echo "$file"; done
Giacomo1968
  • 58,727
janmoesen
  • 538
5

Do the grep:

$ head -n 1 * | grep -B1 "^<?"
==> foo <==
<?
--
==> bar <==
<?
--
==> baz <==
<?

Parse out the filenames:

$ head -n 1 * | grep -B1 "^<?" | sed -n 's/^==> \(.*\) <==$/\1/p'
foo
bar
baz
3

You can use awk for this:

$ cat test1
<?xxx>
111
222
333
$ cat test2
qqq
aaa
zzz
$ awk '/^<\?/{print "Starting with \"<?\":\t" ARGV[ARGIND]; nextfile} {print "Not starting with \"<?\":\t" ARGV[ARGIND]; nextfile}' *
Starting with "<?":     test1
Not starting with "<?": test2
$
hlovdal
  • 3,168
3

Except for empty files, this Perl script seems to work:

perl -e 'while (<>) { print "$ARGV\n" unless m/^<\?/; close ARGV; }' *

I'm not immediately sure how to handle empty files; I'd be tempted to treat them as a separate special case:

find . -type f -size +0 -print0 |
    xargs -0 perl -e 'while (<>) { print "$ARGV\n" unless m/^<\?/; close ARGV; }'
2

Try this

for i in `find * | grep "php$"`; do echo -n $i " -> "; head -1 $i; done

This will get a list of every file ending in PHP, then loop thru it. echoing the file name and then printing the first line of the file. I just inserted

will give you output like:

calendar.php  -> <?php
error.php  -> <?php
events.php  -> <?php
gallery.php  ->
index.php  -> <?php
splash.php  -> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
information.php  -> <?php
location.php  -> <?php
menu.php  -> <?php
res.php  -> <?php
blah.php  -> <?php

then you can stick a normal grep at the end to get rid of what you want to see and find just exceptions

for i in `find * | grep "php$"`; do echo -n $i " -> "; head -1 $i; done | grep -v "<?php"

output:

gallery.php  ->
splash.php  -> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
Roy Rico
  • 6,108
1

Bash 4.0

#!/bin/bash
shopt -s globstar
for php file in /path/**/*.php
do
   exec 4<"$php";read line <&4;exec 4<&-
   case "$line" in
     "<?"*) echo "found: $php"
   esac

done
user31894
  • 2,937
1

As you are trying to only find files that do not start with the search term, I recommend using grep with the -v option.

In this example I've used grep recursively and let it print line numbers, pipe the output to a second grep instance, which matches on line number 1 followed by your search string:

╰─$ grep -nr "." | grep -v "1:<?" 
foo/16:1:a<?
asdf/16:1:a<?
11:1:<a?
16:1:a<?

As you can see the output will consist of filepath:line:match. We can use some sed magic to just output the path to the files:

╰─$ grep -nr "." | grep -v "1:<?" | sed -E 's/(.+?):1:.*/.\/\1/'
./foo/16
./asdf/16
./11
./16
mashuptwice
  • 3,395
0
cat file.txt | head -1 | grep "^<?"

should do what you're asking for.

Phoshi
  • 23,483
0

this:

  % for i in *; do head -1 $i | grep "^<?" ; echo "$i : $?"; done

gives you something like this:

  foo.xml: 0
  bla.txt: 1

every file not containing your pattern will be "marked" with "1". you can play with that until it fits your needs.

akira
  • 63,447
0

Let me have a go at this

find -type f | awk '
{
 if(getline ret < $0){
  if(ret~"^<\\?$"){
   print "Good["$0"]["ret"]";
  }else{
   print "Fail["$0"]";
  };
 }else{
  print "empty["$0"]";
 };
 close($0);
}'

nobody said wak was not available :-)

user42723
  • 161
0

EDIT: as @AdamMierzwiak pointed out, this solution doesn't answer OP's question. (it match every files that has ANY line which start with <?)

grep -lG "^<?" *

explanation:

-l, --files-with-matches
    Suppress normal output; instead print the name of each input 
    file from which output would normally have been printed. The
    scanning will stop on the first match. (-l is specified by POSIX.)

see this link for command explanation : https://explainshell.com/explain?cmd=grep+-lG+%22%5E%3C%3F%22+*

julesl
  • 101