I've a regex pattern of words like welcome1|welcome2|changeme... which I need to search for in thousands of files (varies between 100 to 8000) ranging from 1KB to 24 MB each, in size.
I would like to know if there's a faster way of pattern matching than doing what I have been trying.
Environment:
- jdk 1.8
 - Windows 10
 - Unix4j Library
 
Here's what I tried till now
try (Stream<Path> stream = Files.walk(Paths.get(FILES_DIRECTORY))
                                    .filter(FilePredicates.isFileAndNotDirectory())) {
        List<String> obviousStringsList = Strings_PASSWORDS.stream()
                                                .map(s -> ".*" + s + ".*").collect(Collectors.toList()); //because Unix4j apparently needs this
        Pattern pattern = Pattern.compile(String.join("|", obviousStringsList));
        GrepOptions options = new GrepOptions.Default(GrepOption.count,
                                                        GrepOption.ignoreCase,
                                                        GrepOption.lineNumber,
                                                        GrepOption.matchingFiles);
        Instant startTime = Instant.now();
        final List<Path> filesWithObviousStringss = stream
                .filter(path -> !Unix4j.grep(options, pattern, path.toFile()).toStringResult().isEmpty())
                .collect(Collectors.toList());
        System.out.println("Time taken = " + Duration.between(startTime, Instant.now()).getSeconds() + " seconds");
}
I get Time taken = 60 seconds which makes me think I'm doing something really wrong.
I've tried different ways with the stream and on an average every method takes about a minute to process my current folder of 6660 files.
Grep on mysys2/mingw64 takes about 15 seconds and exec('grep...') in node.js takes about 12 seconds consistently.
I chose Unix4j because it provides java native grep and clean code.
Is there a way to produce better results in Java, that I'm sadly missing?