I have tried to do a comparison between some of the methods presented here. 
First I created a Perl script to generate the input files file1.txt and file2.txt. In order to compare some of the solutions, I made sure that the the words from file1.txt only could appear in the second field in file2.txt. Also to be able to use the join solution presented by @GeorgeVasiliou, I sorted file1.txt and file2.txt. Currently I generated the input files based on only 75 random words ( taken from https://www.randomlists.com/random-words ). Only 5 of these 75 words was used in file1.txt the remaining 70 words was used to fill up the fields in file2.txt. It might be necessary to increase the number of words substantially to get realistic results ( according to the OP, the original file1.txt contained 14000 words). In the tests below I used a file2.txt with 1000000 ( 1 million ) lines. The script also generates the file regexp1.txt required by the grep solution of @BOC.
gen_input_files.pl:
#! /usr/bin/env perl
use feature qw(say);
use strict;
use warnings;
use Data::Printer;
use Getopt::Long;
GetOptions ("num_lines=i" => \my $nlines )
  or die("Error in command line arguments\n");
# Generated random words from site: https://www.randomlists.com/random-words
my $word_filename        = 'words.txt'; # 75 random words
my $num_match_words      = 5;
my $num_file2_lines      = $nlines || 1_000_000;
my $file2_words_per_line = 3;
my $file2_match_field_no = 2;
my $file1_filename       = 'file1.txt';
my $file2_filename       = 'file2.txt';
my $file1_regex_fn       = 'regexp1.txt';
say "generating $num_file2_lines lines..";
my ( $words1, $words2 ) = get_words( $word_filename, $num_match_words );
write_file1( $file1_filename, $words2 );
write_file2(
    $file2_filename, $words1, $words2, $num_file2_lines,
    $file2_words_per_line, $file2_match_field_no
);
write_BOC_regexp_file( $file1_regex_fn, $words2 );
sub write_BOC_regexp_file {
    my ( $fn, $words ) = @_;
    open( my $fh, '>', $fn ) or die "Could not open file '$fn': $!";
    print $fh '\\|' . (join "|", @$words) . '\\|';
    close $fh;
}
sub write_file2 {
    my ( $fn, $words1, $words2, $nlines, $words_per_line, $field_no ) = @_;
    my $nwords1 = scalar @$words1;
    my $nwords2 = scalar @$words2;
    my @lines;
    for (1..$nlines) {
        my @words_line;
        my $key;
        for (1..$words_per_line) {
            my $word;
            if ( $_ != $field_no ) {
                my $index = int (rand $nwords1);
                $word = @{ $words1 }[$index];
            }
            else {
                my $index = int (rand($nwords1 + $nwords2) );
                if ( $index < $nwords2 ) {
                    $word = @{ $words2 }[$index];
                }
                else {
                    $word =  @{ $words1 }[$index - $nwords2];
                }
                $key = $word;
            }
            push @words_line, $word;
        }
        push @lines, [$key, (join "|", @words_line)];
    }
    @lines = map { $_->[1] } sort { $a->[0] cmp $b->[0] } @lines; 
    open( my $fh, '>', $fn ) or die "Could not open file '$fn': $!";
    print $fh (join "\n", @lines);
    close $fh;
}
sub write_file1 {
    my ( $fn, $words ) = @_;
    open( my $fh, '>', $fn ) or die "Could not open file '$fn': $!";
    print $fh (join "\n", sort @$words);
    close $fh;
}
sub get_words {
    my ( $fn, $N ) = @_;
    open( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
    my @words = map {chomp $_; $_} <$fh>;
    close $fh;
    my @words1 = @words[$N..$#words];
    my @words2 = @words[0..($N - 1)];
    return ( \@words1, \@words2 );
}
Next, I created a sub folder solutions with all the test cases:
$ tree solutions/
solutions/
├── BOC1
│   ├── out.txt
│   └── run.sh
├── BOC2
│   ├── out.txt
│   └── run.sh
├── codeforester
│   ├── out.txt
│   ├── run.pl
│   └── run.sh
[...]
Here the files out.txt is the output from the greps for each solution. The scripts run.sh runs the solution for the given test case. 
Notes on the different solutions
- BOC1: First solution presented by @BOC
 - grep -E -f regexp1.txt file2.txt
 
- BOC2: Second solution suggested by @BOC:
 - LC_ALL=C grep -E -f regexp1.txt file2.txt
 
- codeforester: Accepted Perl solution by @codeforester ( see source )
 
- codeforester_orig: Original solution presented by @codeforested:
 - fgrep -f file1.txt file2.txt
 
- dawg: Python solution using dictionary and split line proposed by @dawg ( see source )
 
- gregory1: solution using Gnu Parallel suggested by @gregory
 - parallel -k --pipepart -a file2.txt --block "$block_size" fgrep -F -f file1.txt
 - See note below regarding how to choose - $block_size.
 
- hakon1: Perl solution provided by @HåkonHægland (see source). This solution requires compilation of the c-extension the first time the code is run. It does not require recompilation when- file1.txtor- file2.txtchanges. Note: The time used to compile the c-extension at the initial run is not included in the run times presented below.
 
- ikegami: Solution using assembled regexp and using- grep -Pas given by @ikegami. Note: The assembled regexp was written to a separate file- regexp_ikegami.txt, so the runtime of generating the regexp is not included in the comparison below. This is the code used:
 - regexp=$(< "regexp_ikegami.txt")
grep -P "$regexp" file2.txt
 
- inian1: First solution by @Inian using- match()
 - awk 'FNR==NR{
    hash[$1]; next
}
{
   for (i in hash) if (match($0,i)) {print; break}
}' file1.txt FS='|' file2.txt
 
- inian2: Second solution by @Inian using- index()
 - awk 'FNR==NR{
    hash[$1]; next
}
{
   for (i in hash) if (index($0,i)) {print; break}
}' file1.txt FS='|' file2.txt
 
- inian3: Third solution by @Inian checking only- $2field:
 - awk 'FNR==NR{
    hash[$1]; next
}
$2 in hash' file1.txt FS='|' file2.txt
 
- inian4: 4th soultion by @Inian ( basically the same as- codeforester_origwith- LC_ALL) :
 - LC_ALL=C fgrep -f file1.txt file2.txt
 
- inian5: 5th solution by @Inian (same as- inian1but with- LC_ALL):
 - LC_ALL=C awk 'FNR==NR{
    hash[$1]; next
}
{
   for (i in hash) if (match($0,i)) {print; break}
}' file1.txt FS='|' file2.txt
 
- inian6: Same as- inian3but with- LC_ALL=C. Thanks to @GeorgeVasiliou for suggestion.
 
- jjoao: Compiled flex-generated C code as proposed by @JJoao (see source ). Note: Recompilation of the exectuable must be done each time- file1.txtchanges. The time used to compile the executable is not included in the run times presented below.
 
- oliv: Python script provided by @oliv ( see source )
 
- Vasiliou: Using- joinas suggested by @GeorgeVasiliou:
 - join --nocheck-order -11 -22 -t'|' -o 2.1 2.2 2.3 file1.txt file2.txt
 
- Vasiliou2: Same as- Vasilioubut with- LC_ALL=C.
 
- zdim: Using Perl script provided by @zdim ( see source ). Note: This uses the regexp search version ( instead of split line solution ).
 
- zdim2: Same as- zdimexcept that it uses the- splitfunction instead of regexp search for the field in- file2.txt.
 
Notes
- I experimented a little bit with Gnu parallel (see - gregory1solution above) to determine the optimal block size for my CPU. I have 4 cores, and and currently it seems that the optimal choice is to devide the file (- file2.txt) into 4 equal sized chunks, and run a single job on each of the 4 processors. More testing might be needed here. So for the first test case where- file2.txtis 20M, I set- $block_sizeto 5M ( see- gregory1solution above), whereas for the more realistic case presented below where- file2.txtis 268M, a- $block_sizeof 67M was used.
 
- The solutions - BOC1,- BOC2,- codeforester_orig,- inian1,- inian4,- inian5, and- gregory1all used loose matching. Meaning that the words from- file1.txtdid not have to match exactly in field #2 of- file2.txt. A match anywhere on the line was accepted. Since this behavior made it more difficult to compare them with the other methods, some modified methods were also introduced. The first two methods called- BOC1Band- BOC2Bused a modified- regexp1.txtfile. The lines in the original- regexp1.txtwhere on the form- \|foo1|foo2|...|fooN\|which would match the words at any field boundary. The modified file,- regexp1b.txt, anchored the match to field #2 exclusively using the form- ^[^|]*\|foo1|foo2|...|fooN\|instead.
 - Then the rest of the modified methods - codeforester_origB,- inian1B,- inian4B,- inian5B, and- gregory1Bused a modified- file1.txt. Instead of a literal word per line, the modified file- file1b.txtused one regex per line on the form:
 -  ^[^|]*\|word1\|
 ^[^|]*\|word2\|
 ^[^|]*\|word3\|
 [...]
 - and in addition, - fgrep -fwas replaced by- grep -E -ffor these methods.
 
Running the tests
Here is the script used for running all the tests. It uses the Bash time command to record the time spent for each script. Note that the time command returns three different times call real, user, and sys. First I used user + sys, but realized that this was incorrect when using Gnu parallel command, so the time reported below is now the real part returned by time. See this question for more information about the different times returned by time.
The first test is run with file1.txt containing 5 lines, and file2.txt containing 1000000 lines. Here is the first 52 lines of the run_all.pl script, the rest of the script is available here.
run_all.pl
#! /usr/bin/env perl
use feature qw(say);
use strict;
use warnings;
use Cwd;
use Getopt::Long;
use Data::Printer;
use FGB::Common;
use List::Util qw(max shuffle);
use Number::Bytes::Human qw(format_bytes);
use Sys::Info;
GetOptions (
    "verbose"       => \my $verbose,
    "check"         => \my $check,
    "single-case=s" => \my $case,
    "expected=i"    => \my $expected_no_lines,
) or die("Error in command line arguments\n");
my $test_dir    = 'solutions';
my $output_file = 'out.txt';
my $wc_expected = $expected_no_lines; # expected number of output lines
my $tests       = get_test_names( $test_dir, $case );
my $file2_size  = get_file2_size();
my $num_cpus    = Sys::Info->new()->device( CPU => () )->count;
chdir $test_dir;
my $cmd = 'run.sh';
my @times;
for my $case (@$tests) {
    my $savedir = getcwd();
    chdir $case;
    say "Running '$case'..";
    my $arg = get_cmd_args( $case, $file2_size, $num_cpus );
    my $output = `bash -c "{ time -p $cmd $arg; } 2>&1"`;
    my ($user, $sys, $real ) = get_run_times( $output );
    print_timings( $user, $sys, $real ) if $verbose;
    check_output_is_ok( $output_file, $wc_expected, $verbose, $check );
    print "\n" if $verbose;
    push @times, $real;
    #push @times, $user + $sys; # this is wrong when using Gnu parallel
    chdir $savedir;
}
say "Done.\n";
print_summary( $tests, \@times );
Results
Here is the output from running the tests:
$  run_all.pl --verbose
Running 'inian3'..
..finished in 0.45 seconds ( user: 0.44, sys: 0.00 )
..no of output lines: 66711
Running 'inian2'..
..finished in 0.73 seconds ( user: 0.73, sys: 0.00 )
..no of output lines: 66711
Running 'Vasiliou'..
..finished in 0.09 seconds ( user: 0.08, sys: 0.00 )
..no of output lines: 66711
Running 'codeforester_orig'..
..finished in 0.05 seconds ( user: 0.05, sys: 0.00 )
..no of output lines: 66711
Running 'codeforester'..
..finished in 0.45 seconds ( user: 0.44, sys: 0.01 )
..no of output lines: 66711
[...]
Summary
[Results obtained by @Vasiliou are shown in the middle column.]
                               |Vasiliou
My Benchmark                   |Results  |   Details
-------------------------------|---------|----------------------
inian4             : 0.04s     |0.22s    | LC_ALL fgrep -f [loose] 
codeforester_orig  : 0.05s     |         | fgrep -f [loose]
Vasiliou2          : 0.06s     |0.16s    | [LC_ALL join [requires sorted files]]
BOC1               : 0.06s     |         | grep -E [loose] 
BOC2               : 0.07s     |15s      | LC_ALL grep -E [loose] 
BOC2B              : 0.07s     |         | LC_ALL grep -E [strict] 
inian4B            : 0.08s     |         | LC_ALL grep -E -f [strict] 
Vasiliou           : 0.08s     |0.23s    | [join [requires sorted files]] 
gregory1B          : 0.08s     |         | [parallel + grep -E -f [strict]] 
ikegami            : 0.1s      |         | grep -P 
gregory1           : 0.11s     |0.5s     | [parallel + fgrep -f [loose]] 
hakon1             : 0.14s     |         | [perl + c]
BOC1B              : 0.14s     |         | grep -E [strict] 
jjoao              : 0.21s     |         | [compiled flex generated c code] 
inian6             : 0.26s     |0.7s     | [LC_ALL awk + split+dict] 
codeforester_origB : 0.28s     |         | grep -E -f [strict] 
dawg               : 0.35s     |         | [python + split+dict] 
inian3             : 0.44s     |1.1s     | [awk + split+dict] 
zdim2              : 0.4s      |         | [perl + split+dict] 
codeforester       : 0.45s     |         | [perl + split+dict] 
oliv               : 0.5s      |         | [python + compiled regex + re.search()] 
zdim               : 0.61s     |         | [perl + regexp+dict] 
inian2             : 0.73s     |1.7s     | [awk + index($0,i)] 
inian5             : 18.12s    |         | [LC_ALL awk + match($0,i) [loose]] 
inian1             : 19.46s    |         | [awk + match($0,i) [loose]] 
inian5B            : 42.27s    |         | [LC_ALL awk + match($0,i) [strict]] 
inian1B            : 85.67s    |         | [awk + match($0,i) [strict]] 
Vasiliou Results : 2 X CPU Intel 2 Duo T6570 @ 2.10GHz - 2Gb RAM-Debian Testing 64bit- kernel 4.9.0.1 - no cpu freq scaling.
A more realistic test case
I then created a more realistic case with file1.txt having 100 words and file2.txt having 10 million lines (268Mb file size). I extracted 1000 random words from the dictionary at /usr/share/dict/american-english using shuf -n1000 /usr/share/dict/american-english > words.txt then extracted 100 of these words into file1.txt and then constructed file2.txt the same way as described above for the first test case. Note that the dictionary file was UTF-8 encoded, and I stripped away all non-ASCII characters from the words.txt.
Then I run the test without the three slowest methods from the previous case. I.e. inian1, inian2, and inian5 were left out. Here are the new results:  
gregory1           : 0.86s     | [parallel + fgrep -f [loose]]
Vasiliou2          : 0.94s     | [LC_ALL join [requires sorted files]]
inian4B            : 1.12s     | LC_ALL grep -E -f [strict] 
BOC2B              : 1.13s     | LC_ALL grep -E [strict] 
BOC2               : 1.15s     | LC_ALL grep -E [loose] 
BOC1               : 1.18s     | grep -E [loose] 
ikegami            : 1.33s     | grep -P 
Vasiliou           : 1.37s     | [join [requires sorted files]]
hakon1             : 1.44s     | [perl + c]
inian4             : 2.18s     | LC_ALL fgrep -f [loose] 
codeforester_orig  : 2.2s      | fgrep -f [loose] 
inian6             : 2.82s     | [LC_ALL awk + split+dict] 
jjoao              : 3.09s     | [compiled flex generated c code] 
dawg               : 3.54s     | [python + split+dict] 
zdim2              : 4.21s     | [perl + split+dict]
codeforester       : 4.67s     | [perl + split+dict] 
inian3             : 5.52s     | [awk + split+dict] 
zdim               : 6.55s     | [perl + regexp+dict] 
gregory1B          : 45.36s    | [parallel + grep -E -f [strict]] 
oliv               : 60.35s    | [python + compiled regex + re.search()] 
BOC1B              : 74.71s    | grep -E [strict] 
codeforester_origB : 75.52s    | grep -E -f [strict] 
Note
The grep based solutions were looking for a match on the whole line, so in this case they contained some false matches: the methods codeforester_orig, BOC1, BOC2, gregory1, inian4, and oliv extracted 1,087,609 lines out of 10,000,000 lines, whereas the other methods extracted the correct 997,993 lines from file2.txt.
Notes