You can allocate state on thread local memory. Thread local memory is thread safe (as long as you don't pass pointers arround). 
You can use two integers to generate the unique number and only one is a synchronized number. 
Integer 1: a incrementing integer which represents the thread, a new number is generated each time you initialize a thread (which should be a rare event). 
Integer2: on thread initialization you start this integer on 0. 
You will use both integers -- which are stored in the thread local memory -- as the unique integer, and integer 2 will be incremented normally (unlocked).
This way the generation of unique integers is absolutely thread safe -- i.e., you don't have to use a atomic CPU instruction -- Interlocked.increment (which does cause hardware level performance penalties).
-- edit : cache coherence --
from :
Cache Coherence
To decrease time required for memory access different caches are
  used: recently accessed memory
  duplicated in CPU cache which is
  significantly faster than common
  memory. Future access to the same
  address will use data saved in cache,
  decreasing fetch time. The problems
  appear in SMP (symmetric
  multiprocessing) systems, where there
  are several processors have own cache
  memory: when one processor changes
  variable in memory region, used by
  several processors simultaneously, it
  actually changes own copy of a
  variable, located in cache, while
  shared variable still has original
  value. This problem could not be
  solved by using volatile keyword on a
  shared variable, since this will only
  guarantee that write-to-memory
  instruction will present in resulting
  program, but operations related to
  cache are still not specified. Of
  course, it is possible to disable CPU
  cache, mapping memory as no-cache
  (PAGE_NOCACHE protection flag in
  VirtualAlloc() Win32 API function),
  but along with significant slowdown
  this imposes some limitations: for
  example, interlocked instructions may
  raise hardware exception on no-cache
  memory.
For correct work of SMP systems data which is stored in cache of more
  than one processor should be the same
  in all caches. This means that CPU
  caches must be synchronized (kept
  coherent) at a hardware level**. But it
  is important to note that cache
  synchronization (flow of cache
  coherency traffic) is made
  asynchronously with program execution:**
  when one CPU changes value of a shared
  variable another CPU temporarily
  observes old value. This means CPU
  continues execution without waiting
  for cache coherency operation to be
  completed. Furthermore, if two
  variables (a then b) were changed by
  first CPU, another CPU could observe
  that b has changed earlier than a.
Interlocked instructions have considerable differences on this
  point. Exactly, interlocked
  instruction is a command to make
  something directly on a physical
  memory under a locked bus. This means
  that caches inconsistency does not
  affects programs where shared
  variables are accessed with only
  interlocked instructions (note that
  both processes, that who reads a
  variable and that writes to it, should
  use interlocked instructions).
-- edit : Further clarrification:  --
Under your current design Interlocked increment is indeed your best bet, but it is far from ideal. Your CPU has a VERY FAST cache onchip (often clocked at the same speed as the CPU). If you have memory local to your thread it will be pulled into the CPU your thread is on which means your CPU won't have to go to main memory and it can fly at full-speed.
If you use interlocked increment your CPU will have to 
- lock the bus.
 
- Increase the 32-bit word.
 
- release the bus. 
 
You can do without this. I might appear to be acting pedantic, as the overhead might only be a 100% decrease in relative performance. However in a industrial server app with a 4 physical CPU's and 16 cores with this UID generator firing on every request... trust you me, your bus will be screwed. Micro-optimization is an important area in programming, especially as we are scaling horizontally now.