I adjusted your example so all algorithms work on the same data. I also added one more variant with in if implementation.
@State(Scope.Thread)
public class ForVsSwitch {
    private static final int MOVES_LENGTH = 1024;
    private static final char[] COMMANDS = { 'U', 'D', 'L', 'R'};
    private char[] moves;
    @Setup
    public void prepare(){
        Random random = new Random();
        moves = new char[MOVES_LENGTH];
        for(int i=0; i< MOVES_LENGTH; i++) {
            moves[i] = COMMANDS[random.nextInt(4)];
        }
    }
    @Benchmark
    @BenchmarkMode(Mode.SampleTime)
    @OutputTimeUnit(TimeUnit.MILLISECONDS)
    @Warmup(iterations = 3)
    @Measurement(iterations = 5)
    public void withSwitch() {
        judgeCircleWithSwitch(moves);
    }
    @Benchmark
    @BenchmarkMode(Mode.SampleTime)
    @OutputTimeUnit(TimeUnit.MILLISECONDS)
    @Warmup(iterations = 3)
    @Measurement(iterations = 5)
    public void withFor() {
        judgeCircleWithFor(moves);
    }
    @Benchmark
    @BenchmarkMode(Mode.SampleTime)
    @OutputTimeUnit(TimeUnit.MILLISECONDS)
    @Warmup(iterations = 3)
    @Measurement(iterations = 5)
    public void withIf() {
        judgeCircleWithIf(moves);
    }
    private boolean judgeCircleWithSwitch(char[] moves) {
        int vertical = 0;
        int horizontal = 0;
        for(int i = 0; i < moves.length; i++){
            char c = moves[i];
            switch(c){
                case 'U':
                    vertical ++;
                    break;
                case 'D':
                    vertical --;
                    break;
                case 'L':
                    horizontal --;
                    break;
                case 'R':
                    horizontal ++;
                    break;
            }
        }
        return (vertical == 0) && (horizontal == 0);
    }
    private boolean judgeCircleWithIf(char[] moves) {
        int vertical = 0;
        int horizontal = 0;
        for(int i = 0; i < moves.length; i++){
            char c = moves[i];
            if(c == 'U') {
                vertical++;
            } else if(c == 'D') {
                vertical--;
            } else if(c == 'L') {
                horizontal--;
            } else if(c == 'R') {
                horizontal ++;
            }
        }
        return (vertical == 0) && (horizontal == 0);
    }
    private boolean judgeCircleWithFor(char[] moves) {
        int x = charCount(moves, 'R') - charCount(moves, 'L');
        int y = charCount(moves, 'U') - charCount(moves, 'D');
        return x == 0 && y == 0;
    }
    private int charCount(char[] moves, char c) {
        int count = 0;
        for(int i=0; i<moves.length; i++) {
            if(moves[i] == c) {
                count++;
            }
        }
        return count;
    }
}
If I read the results correctly 99.9% of the executions are faster than 27ms to 29ms, right? There seems to be no difference between the algorithms.
Benchmark                                    Mode      Cnt   Score    Error  Units
ForVsSwitch.withFor                        sample  5680658   0,003 ±  0,001  ms/op
ForVsSwitch.withFor:withFor·p0.00          sample            0,002           ms/op
ForVsSwitch.withFor:withFor·p0.50          sample            0,003           ms/op
ForVsSwitch.withFor:withFor·p0.90          sample            0,003           ms/op
ForVsSwitch.withFor:withFor·p0.95          sample            0,004           ms/op
ForVsSwitch.withFor:withFor·p0.99          sample            0,019           ms/op
ForVsSwitch.withFor:withFor·p0.999         sample            0,029           ms/op
ForVsSwitch.withFor:withFor·p0.9999        sample            0,075           ms/op
ForVsSwitch.withFor:withFor·p1.00          sample            2,912           ms/op
ForVsSwitch.withIf                         sample  8903669   0,002 ±  0,001  ms/op
ForVsSwitch.withIf:withIf·p0.00            sample            0,001           ms/op
ForVsSwitch.withIf:withIf·p0.50            sample            0,002           ms/op
ForVsSwitch.withIf:withIf·p0.90            sample            0,002           ms/op
ForVsSwitch.withIf:withIf·p0.95            sample            0,003           ms/op
ForVsSwitch.withIf:withIf·p0.99            sample            0,005           ms/op
ForVsSwitch.withIf:withIf·p0.999           sample            0,027           ms/op
ForVsSwitch.withIf:withIf·p0.9999          sample            0,051           ms/op
ForVsSwitch.withIf:withIf·p1.00            sample            5,202           ms/op
ForVsSwitch.withSwitch                     sample  8225249   0,002 ±  0,001  ms/op
ForVsSwitch.withSwitch:withSwitch·p0.00    sample            0,001           ms/op
ForVsSwitch.withSwitch:withSwitch·p0.50    sample            0,002           ms/op
ForVsSwitch.withSwitch:withSwitch·p0.90    sample            0,002           ms/op
ForVsSwitch.withSwitch:withSwitch·p0.95    sample            0,003           ms/op
ForVsSwitch.withSwitch:withSwitch·p0.99    sample            0,018           ms/op
ForVsSwitch.withSwitch:withSwitch·p0.999   sample            0,027           ms/op
ForVsSwitch.withSwitch:withSwitch·p0.9999  sample            0,071           ms/op
ForVsSwitch.withSwitch:withSwitch·p1.00    sample           22,610           ms/op
EDIT:
I can not confirm that your statements holds. I simplified the example. I use a static list as input for both algorithms. I do not do warmup and only measure a single execution. As expected, 4-pass is more expensive than 1-pass. I really can not tell what your website is measuring.
@State(Scope.Thread)
public class ForVsSwitch {
    private char[] moves = {'U', 'D', 'L', ...};
    @Benchmark
    @BenchmarkMode(Mode.SingleShotTime)
    @OutputTimeUnit(TimeUnit.MILLISECONDS)
    @Warmup(iterations = 0)
    @Measurement(iterations = 1, batchSize = 1)
    @Fork(value = 1, warmups = 0)
    public void withSwitch() {
        judgeCircleWithSwitch();
    }
    @Benchmark
    @BenchmarkMode(Mode.SingleShotTime)
    @OutputTimeUnit(TimeUnit.MILLISECONDS)
    @Warmup(iterations = 0)
    @Measurement(iterations = 1, batchSize = 1)
    @Fork(value = 1, warmups = 0)
    public void withFor() {
        judgeCircleWithFor();
    }
    private boolean judgeCircleWithSwitch() {
        int vertical = 0;
        int horizontal = 0;
        for(int i = 0; i < moves.length; i++){
            char c = moves[i];
            switch(c){
                case 'U':
                    vertical ++;
                    break;
                case 'D':
                    vertical --;
                    break;
                case 'L':
                    horizontal --;
                    break;
                case 'R':
                    horizontal ++;
                    break;
            }
        }
        return (vertical == 0) && (horizontal == 0);
    }
    private boolean judgeCircleWithFor() {
        int x = charCount(moves, 'R') - charCount(moves, 'L');
        int y = charCount(moves, 'U') - charCount(moves, 'D');
        return x == 0 && y == 0;
    }
    private int charCount(char[] moves, char c) {
        int count = 0;
        for(int i=0; i<moves.length; i++) {
            if(moves[i] == c) {
                count++;
            }
        }
        return count;
    }
}
The for loop is more expensive than the switch. But as pointed out in other comments running it once is no reliable performance analysis.
Benchmark               Mode  Cnt  Score   Error  Units
ForVsSwitch.withFor       ss       0,577          ms/op
ForVsSwitch.withSwitch    ss       0,241          ms/op