When you optimize anything, make sure to measure first! (most of us tend to forget this....)
When you optimize Prolog, look out for the following:
- Tail recursion tends to do better (so there goes your "before or after series of statements" question);
 
- Avoid creating choice points you don't need (this depends on the Prolog implementation)
 
- Use an optimal algorithm (as in, don't traverse a list twice if you don't have to).
 
A solution that is "optimized" for a more or less standard Prolog implementation will look maybe a bit different. I will name it list_uniq (in analogy to the command-line uniq tool):
list_uniq([], []). % Base case
list_uniq([H|T], U) :-
    list_uniq_1(T, H, U). % Helper predicate
list_uniq_1([], X, [X]).
list_uniq_1([H|T], X, U) :-
    (   H == X
    ->  list_uniq_1(T, X, U)
    ;   [X|U1] = U,
        list_uniq_1(T, H, U1)
    ).
It is different from the reduce0/2 by @CapelliC because it uses lagging to avoid the inherent non-determinism of [X|Xs] and [X,X|Xs] in the first argument.
Now to the claim that it is "optimized":
- It traverses the list exactly once (no need for reversing)
 
- It it tail-recursive
 
- It does not create and discard choice points
 
You will get the same 12 inferences as @CapelliC, and if you then use a somewhat longer list, you will start to see differences:
?- length(As, 100000), maplist(=(a), As),
   length(Bs, 100000), maplist(=(b), Bs),
   length(Cs, 100000), maplist(=(c), Cs),
   append([As, Bs, Cs, As, Cs, Bs], L),
   time(list_uniq(L, U)).
% 600,006 inferences, 0.057 CPU in 0.057 seconds (100% CPU, 10499893 Lips)
As = [a, a, a, a, a, a, a, a, a|...],
Bs = [b, b, b, b, b, b, b, b, b|...],
Cs = [c, c, c, c, c, c, c, c, c|...],
L = [a, a, a, a, a, a, a, a, a|...],
U = [a, b, c, a, c, b].
The same query with reduce0, reduce1, reduce2 from @CapelliC's answer:
% reduce0(L, U)
% 600,001 inferences, 0.125 CPU in 0.125 seconds (100% CPU, 4813955 Lips)
% reduce1(L, U)
% 1,200,012 inferences, 0.393 CPU in 0.394 seconds (100% CPU, 3050034 Lips)
% reduce2(L, U)
% 2,400,004 inferences, 0.859 CPU in 0.861 seconds (100% CPU, 2792792 Lips)
So, creating and discarding choice points with cuts (!) has a price, too.
However, list_uniq/2, as it stands, can be wrong for queries where the first argument is not ground:
?- list_uniq([a,B], [a,b]).
B = b. % OK
?- list_uniq([a,A], [a]).
false. % WRONG!
reduce0/2 and reduce1/2 can be wrong, too:
?- reduce0([a,B], [a,b]).
false.
?- reduce1([a,B], [a,b]).
false.
As for reduce2/2, I am not sure about this one:
?- reduce2([a,A], [a,a]).
A = a.
Instead, using the definition of if_/3 from this answer:
list_uniq_d([], []). % Base case
list_uniq_d([H|T], U) :-
    list_uniq_d_1(T, H, U). % Helper predicate
list_uniq_d_1([], X, [X]).
list_uniq_d_1([H|T], X, U) :-
    if_(H = X,
        list_uniq_d_1(T, H, U),
        (   [X|U1] = U,
            list_uniq_d_1(T, H, U1)
        )
    ).
With it:
?- list_uniq_d([a,a,a,b], U).
U = [a, b].
?- list_uniq_d([a,a,a,b,b], U).
U = [a, b].
?- list_uniq_d([a,A], U).
A = a,
U = [a] ;
U = [a, A],
dif(A, a).
?- list_uniq_d([a,A], [a]).
A = a ;
false. % Dangling choice point
?- list_uniq_d([a,A], [a,a]).
false.
?- list_uniq_d([a,B], [a,b]).
B = b.
?- list_uniq_d([a,A], [a,a]).
false.
It takes longer, but the predicate seems to be correct.
With the same query as used for the other timings:
% 3,000,007 inferences, 1.140 CPU in 1.141 seconds (100% CPU, 2631644 Lips)