I am going to say that doing it in assembly is a bad idea.
You should be using high level language constructs. This will allow the code to be portable and when push comes to shove the compiler will beat "most" humans at any peephole optimization like this.
So I checked the output of g++ to see what assembly it generated.
main.cpp
#include <array>
#include <iostream>
bool testX(int a, int b);
bool testY(std::array<char, 4> const& a, std::array<char, 4> const& b);
bool testZ(char const(&a)[4], char const(&b)[4]);
int main()
{
    {
        int a = 'ATCG';
        int b = 'ATCG';
        if (testX(a, b)) {
            std::cout << "Equal\n";
        }
    }
    {
        std::array<char, 4> a {'A', 'T', 'C', 'G'};
        std::array<char, 4> b {'A', 'T', 'C', 'G'};
        if (testY(a, b)) {
            std::cout << "Equal\n";
        }
    }
    {
        char    a[] = {'A', 'T', 'C', 'G'};
        char    b[] = {'A', 'T', 'C', 'G'};
        if (testZ(a, b)) {
            std::cout << "Equal\n";
        }
    }
}
With optimization enabled, we get nice asm from clang, and usually from recent gcc on the Godbolt compiler explorer.  (The main above would optimize away the compares if the functions can inline, because the inputs are compile-time constants.)
X.cpp
bool testX(int a, int b)
{
    return a == b;
}
# gcc and clang -O3 asm output
testX(int, int):
    cmpl    %esi, %edi
    sete    %al
    ret
Z.cpp
#include <cstring>
bool testZ(char const(&a)[4], char const(&b)[4])
{
    return std::memcmp(a, b, sizeof(a)) == 0;
}
Z.s
# clang, and gcc7 and newer, -O3
testZ(char const (&) [4], char const (&) [4]):
    movl    (%rdi), %eax
    cmpl    (%rsi), %eax
    sete    %al
    retq
Y.cpp
#include <array>
bool testY(std::array<char, 4> const& a, std::array<char, 4> const& b)
{
    return a == b;
}
Y.s
# only clang does this.  gcc8.2 actually calls memcmp with a constant 4-byte size
testY(std::array<char, 4ul> const&, std::array<char, 4ul> const&):           
    movl    (%rdi), %eax
    cmpl    (%rsi), %eax
    sete    %al
    retq
So std::array and memcmp for comparing 4-byte objects both produce identical code with clang, but with gcc only memcmp optimizes well.
Of course, the stand-alone version of the function has to actually produce a 0 / 1 integer, instead of just setting flags for a jcc to branch on directly.  The caller of these functions will have to test %eax,%eax before branching.  But if the compiler can inline these functions, that overhead goes away.