I've done some experimentation and discovered that you should use extended regex syntax.
So the characters ‘?’, ‘+’, ‘{’, ‘|’, ‘(’, and ‘)’ have their special meaning and must be escaped with a '' to be taken literally.
When --exclude is used, any file that matches (anywhere) will not be scanned. So if the regex matches a directory the file is in, the file will not be scanned. But it will be checked to see if it matches. If --exclude-dir is used if a directory matches none of it's contents will be scanned, or examined for matches.
This can be seen in the log (if specified with --log) with a single entry that a directory is excluded versus an entry for every file in that directory being excluded.
Here is a bash script I created for myself that builds a couple of --exclude-dir options to exclude root directories and subdirectories. I'm sure it can be improved, but I hope it proves a useful example of the regex and the various options I thought I'd want for running a scan.
#! /usr/bin/env bash
bool function to test if the user is root or not
is_user_root () { [ "${EUID:-$(id -u)}" -eq 0 ]; }
is_user_root || {
echo 'You are just an ordinary user. Run as root.' >&2
exit 1
}
LOG_FILE=/var/log/clamscan.log
EXCLUDE_ROOT_DIRS=(proc sys dev media mnt data/Downloads)
EXCLUDE_SUBDIRS=('lost+found' .git)
declare -a EXCLUDE_DIRS
if [[ ${#EXCLUDE_ROOT_DIRS[@]} -ne 0 ]]; then
ED_RE="^/("; for xrd in ${EXCLUDE_ROOT_DIRS[@]}; do ED_RE+="$xrd|"; done; ED_RE="${ED_RE%|})"
EXCLUDE_DIRS+=("--exclude-dir=$ED_RE")
#EXCLUDE_DIRS+="--exclude-dir=^/("; for xrd in ${EXCLUDE_ROOT_DIRS[@]}; do EXCLUDE_DIRS+="$xrd|"; done; EXCLUDE_DIRS="${EXCLUDE_DIRS%|})"; echo $EXCLUDE_DIRS
fi
if [[ ${#EXCLUDE_SUBDIRS[@]} -ne 0 ]]; then
ED_RE="/("; for xsd in ${EXCLUDE_SUBDIRS[@]}; do ED_RE+="$xsd|"; done; ED_RE="${ED_RE%|})"
EXCLUDE_DIRS+=("--exclude-dir=$ED_RE")
#EXCLUDE_DIRS+=" --exclude-dir=/("; for xsd in ${EXCLUDE_SUBDIRS[@]}; do EXCLUDE_DIRS+="$xsd|"; done; EXCLUDE_DIRS="${EXCLUDE_DIRS%|})"; echo $EXCLUDE_DIRS
fi
Adding --verbose will write Scanning messages to stdout e.g.
Scanning /data/Games/henry/Steam/ubuntu12_32/steam-runtime/usr/share/doc/libglib2.0-0/README.gz
echo clamscan --suppress-ok-results --log=$LOG_FILE --max-filesize=100M --recursive ${EXCLUDE_DIRS[@]} /
clamscan --suppress-ok-results --log=$LOG_FILE --max-filesize=100M --recursive ${EXCLUDE_DIRS[@]} /
NOTE: may want to edit the log file and remove the reports on Symbolic links and Empty files
sed -i -E -e '/: (Symbolic link|Empty file)$/d' $LOG_FILE
echo 'List of any FOUND infected files'
grep FOUND$ /var/log/clamav.log
The commandline the above script executes is:
clamscan --suppress-ok-results --log=/var/log/clamscan.log --max-filesize=100M --recursive --exclude-dir=^/(proc|sys|dev|media|mnt|data/Downloads) --exclude-dir=/(lost\+found|.git) /