2 questions on locks performance/use

Question

On the server application, I need to assign to each connected client an unique ID, so I am doing it this way:

private short GetFreeID()
{
    lock (this.mUsedPlayerIDsSynchronization)
    {
        for (short I = 1; I < 500; I++)
        {
            if (ClientIDPool[I] == false)
            {
                ClientIDPool[I] = true;
                return I;
            }
        }
        return -1;
    }
}

My first question: Could it be done more efficiently, I mean with better performance? I have read here that we should learn to write code without locks. I have also read there for some atomic operations there are other options. Second question: What if I wanted to lock the whole class in order to do not allow to make any changes within? For example: one client will update second clients data, can I lock the whole second client class that it is absolutely blocked? I still think "lock" will only make sure that code inside its snippet is entered by only one thread at the time, so I do not know if "lock(client2)" causes that nothing in that class can be changed until this lock is released.

If you truly want performance any form of CPU assisted synchronization will interfere with it -- full stop. Interlocked instructions are better then general locks -- however only in the short term. (locking a bus to change a single 4-byte value is not nice =D). — Hassan Syed, Dec 22 '09 at 13:39

Jon Skeet · Accepted Answer · 2009-12-21T13:32:30.057

11

Locks are often the simplest way of getting things right, which is very important. Quite often it doesn't matter if there's a more efficient way of doing things, so long as you've got clear code and it performs well enough.

A more performant approach here, however, would be to either generate a random GUID, or if you do want to reuse IDs, have a "pool" (e.g. a LinkedList) of unused IDs. You could then take from the pool very quickly, and return the ID to the pool (again quickly) once you're done.

Alternatively, if you really just need an integer and it doesn't have to be a low one, you could have a static variable which starts at 0 and which you just increment each time - you can do this without locking using Interlocked.Increment should you wish. I doubt that you'll run out of 64-bit integers, for example :)

As for your second question: yes, locks are advisory. If everything within the class takes out the same lock before changing any fields (and the fields are private) then that prevents other code from misbehaving... but each bit of code does need to take out the lock.

EDIT: If you really only need an integer, I would still suggest just using Interlocked.Increment - even if your traffic increases 1000-fold, you could use a 64 bit integer instead. However, if you want to reuse IDs then I'd suggest creating a new type to represent the "pool". Give that a counter of how many have been created, so that if you run out you can assign a new item. Then just store the available ones in a Queue<int>, LinkedList<int> or Stack<int> (it's not going to matter very much which you use). Assuming you can trust your own code to return IDs sensibly, you can make the API as simple as:

int AllocateID()
void ReturnID(int id)

AllocateID would check to see if the pool is empty, and allocate a new ID if so. Otherwise, it would just remove the first entry from the pool and return that. ReturnID would just add the specified ID to the pool.

edited Dec 21 '09 at 13:32

answered Dec 21 '09 at 13:18

Jon Skeet

1,421,763
867
9,128
9,194

Thank you! Well the IDs are then send to the clients, it has to be reusable as clients are often reconnecting (1000 per hour is typical). I am intersted in your "pool" solution, could you please more specify? Thanks! – Thomas Dec 21 '09 at 13:23
@Tomas: I don't see how 1000 per hour means they have to be reusable. Even if you use an Int32 starting from 0, that would still give you 245 *years* of unique IDs at 1000 per hour... using Interlocked.Increment really would be simpler here. – Jon Skeet Dec 21 '09 at 13:28
Yes, I would not run out of IDs, I am rather interested in not sending large numbers to clients, there are a lot of operations with it. – Thomas Dec 21 '09 at 13:31
@Tomas: Okay. Have updated answer to expand on the pool idea. – Jon Skeet Dec 21 '09 at 13:33
Can do better than interlocked increment. Thread local storage. – Hassan Syed Dec 21 '09 at 13:48
So much talk about the Y10K problem, and still we're satisfied with a lousy 245 years ^^ – cwap Dec 21 '09 at 14:00
I would have one more question - is there any method that will take the item from the qeue automatically? It sounds to me a bit slow if I need to remove the object from the qeue and return it. Thanks – Thomas Dec 21 '09 at 14:10
@Hassan: How would thread-local storage help to allocate unique IDs here? Also, how certain are you that it would be faster in the first place? @Tomas: "Sounds a bit slow" is never a good start: benchmark it. Linked lists make adding/removing from each end very, very quick. – Jon Skeet Dec 21 '09 at 15:56
@jon,@hassan: It will definitely be much more quick than iterating through an array of 500 elements which you had first ;^) – Toad Dec 21 '09 at 16:27
@Tomas: Why is Int32.MaxValue a large number?? If you store it as integer or sent it as binary you'll always using 32-bit. And if you send it as string over the wire you will sent a maximum of ten bytes, instead of four or five. – Oliver Dec 22 '09 at 07:34
@Jon, please follow my link and read about SMP and interlocked instructions. Interlocked instructions are almost as bad as a general lock. High level programmers have an incorrect view of synchronization and are misled by the "atomic instruction" propaganda. Thread local storage is memory assigned to the thread -- i.e., synchronized by design. – Hassan Syed Dec 22 '09 at 13:17
@Hassan: No, it's not synchronized by design - it's local to the thread. That *doesn't* help when you want global UIDs, unless you use some sort of hi/lo algorithm (which is looks like you're talking about). However, I would *seriously* question whether it's really appropriate to introduce that extra complexity in this case. Allocating a thousand IDs an hour is hardly going to tax the processor... why not stick to a simpler solution? – Jon Skeet Dec 22 '09 at 14:08

score 2 · Answer 2 · answered Dec 21 '09 at 13:20

You are locking while scanning through an array.

You'd be better off having 2 stacks. One is one with free id's, and one is with ID's in use. That way you can just pop one of the first stack and push it on the second.

This way you are locking much less long.

score 2 · Answer 3 · edited May 23 '17 at 12:01

You can allocate state on thread local memory. Thread local memory is thread safe (as long as you don't pass pointers arround).

You can use two integers to generate the unique number and only one is a synchronized number.

Integer 1: a incrementing integer which represents the thread, a new number is generated each time you initialize a thread (which should be a rare event).

Integer2: on thread initialization you start this integer on 0.

You will use both integers -- which are stored in the thread local memory -- as the unique integer, and integer 2 will be incremented normally (unlocked).

This way the generation of unique integers is absolutely thread safe -- i.e., you don't have to use a atomic CPU instruction -- Interlocked.increment (which does cause hardware level performance penalties).

-- edit : cache coherence --

from :

Cache Coherence

To decrease time required for memory access different caches are used: recently accessed memory duplicated in CPU cache which is significantly faster than common memory. Future access to the same address will use data saved in cache, decreasing fetch time. The problems appear in SMP (symmetric multiprocessing) systems, where there are several processors have own cache memory: when one processor changes variable in memory region, used by several processors simultaneously, it actually changes own copy of a variable, located in cache, while shared variable still has original value. This problem could not be solved by using volatile keyword on a shared variable, since this will only guarantee that write-to-memory instruction will present in resulting program, but operations related to cache are still not specified. Of course, it is possible to disable CPU cache, mapping memory as no-cache (PAGE_NOCACHE protection flag in VirtualAlloc() Win32 API function), but along with significant slowdown this imposes some limitations: for example, interlocked instructions may raise hardware exception on no-cache memory.

For correct work of SMP systems data which is stored in cache of more than one processor should be the same in all caches. This means that CPU caches must be synchronized (kept coherent) at a hardware level**. But it is important to note that cache synchronization (flow of cache coherency traffic) is made asynchronously with program execution:** when one CPU changes value of a shared variable another CPU temporarily observes old value. This means CPU continues execution without waiting for cache coherency operation to be completed. Furthermore, if two variables (a then b) were changed by first CPU, another CPU could observe that b has changed earlier than a.

Interlocked instructions have considerable differences on this point. Exactly, interlocked instruction is a command to make something directly on a physical memory under a locked bus. This means that caches inconsistency does not affects programs where shared variables are accessed with only interlocked instructions (note that both processes, that who reads a variable and that writes to it, should use interlocked instructions).

-- edit : Further clarrification: --

Under your current design Interlocked increment is indeed your best bet, but it is far from ideal. Your CPU has a VERY FAST cache onchip (often clocked at the same speed as the CPU). If you have memory local to your thread it will be pulled into the CPU your thread is on which means your CPU won't have to go to main memory and it can fly at full-speed.

If you use interlocked increment your CPU will have to

lock the bus.
Increase the 32-bit word.
release the bus.

You can do without this. I might appear to be acting pedantic, as the overhead might only be a 100% decrease in relative performance. However in a industrial server app with a 4 physical CPU's and 16 cores with this UID generator firing on every request... trust you me, your bus will be screwed. Micro-optimization is an important area in programming, especially as we are scaling horizontally now.

Interlocked.Increment is a single CPU instruction on modern CPUs, unlike externally grabbing a lock and then incrementing the value. — Paolo, Dec 21 '09 at 18:58
Yes it is a single instruction, but it locks the bus only to alter one word. — Hassan Syed, Dec 22 '09 at 13:23
@Hassan: Please bear in mind the context here - this is *not* something which has to be optimised to within an inch of its life. Simplicity wins over micro-optimisation until you've hit a bottleneck, IMO. — Jon Skeet, Dec 22 '09 at 14:09
I agree Jon, there is probably nothing to be gained by optimizing a business app in terms of synchronization. However if the question is asked we should give the most comprehensive answer, after all the climate of programming is changing (multi core UMA machines), and truly understanding the programming model leads to a real competitive advantage in the programming world. — Hassan Syed, Dec 22 '09 at 15:09

score 1 · Answer 4 · answered Dec 21 '09 at 13:20

1

I'd suggest you use GUIDs, unless you are required to use a short. Guid.NewGuid() is thread safe, so you eliminate the need for locks or other synchronization mechanisms.

answered Dec 21 '09 at 13:20

Håvard S

23,244
8
61
72

You'd still need to lock and check the count of clients connected if a there's a 500 maximum – Paolo Dec 21 '09 at 13:23
And I need to have as small numbers as possible, those are send back to clients. – Thomas Dec 21 '09 at 13:27
My suggestion was based on the assumption that you simply needed a unique ID. If the 500 limit and low numbers is a requirement, you'll need a pool of free IDs and like Jon suggests. – Håvard S Dec 21 '09 at 13:34
GUID's are nasty slow. Even when generated serially. – Hassan Syed Dec 21 '09 at 13:55
I'm not the only one suggesting GUIDs here, and they solve the problem, so why the downvote? – Håvard S Dec 21 '09 at 14:10
I hope I did not give it to you, I am unable to give those points (says I need to register or login). Anyway, it was said that the numbers have to be reusable so maybe that is the reason. – Thomas Dec 21 '09 at 14:14
GUID's are an order of magnitude slower than custom in-process UID schemes. If someone asks you how to make his motorcycle faster you shouldn't suggest removing the engine to make it lighter. Such a suggestion would be entirely inappropriate and off-topic, just like what you suggested. – Hassan Syed Dec 22 '09 at 15:16

score 1 · Answer 5 · answered Dec 21 '09 at 13:22

Do you care about the ID returned? You could either increment the client ID using Interlocked.Increment each time or generate GUID (the former is likely to be faster).

Then use a simple counter to track how many clients are connected rather than scanning the array every time.

2 questions on locks performance/use

5 Answers5

Linked