Edit: ST does not allow to post more than two links for newbies. Sorry for the missing references.
I'm trying to reduce locking overhead in a C application where detecting changes on a global state is performance relevant. Even though I've been reading quite a lot on the topic lately (e.g. a lot from H. Sutter, and many more) I fail to be confident about my implementation. I would like to use a combination of a CAS like operation and DCL for a check on a Cache-Line Aligned global variable, thus avoiding false-sharing, to update thread local data from data shared among multiple threads. My lack of confidence is mainly due to
- me failing to interpret the GNU documentation on Type-Attributes
- I seem not being able to find any literature and examples that I could easily translate to C, such as aligning-to-cache-line-and-knowing-the-cache-line-size on ST or 1 (although 1 seems to answer my question somewhat I'm not confident with my implementation)
- my experience with C is limited
My questions:
- The Type-Attributes documentation states: - This attribute specifies a minimum alignment (in bytes) for variables of the specified type. For example, the declarations: - (please see Type-Attributes documentation for declaration) - force the compiler to insure (as far as it can) that each variable whose type is - struct Sor- more_aligned_intwill be allocated and aligned at least on a- 8-byteboundary. On a SPARC, having all variables of type- struct Saligned to- 8-byteboundaries allows the compiler to use the ldd and std (doubleword load and store) instructions when copying one variable of type struct S to another, thus improving run-time efficiency.- Does that mean that the beginning of - struct Sor- more_aligned_intwill always be aligned to- 8-byteboundary? It does not mean the data will be padded to use exactly 64 bytes, right?
- Assuming 1. is true that every instance of - struct cache_line_aligned(see code Example 1 below) aligns on- 64-byteboundaries and utilize exactly one cache-line (assuming cache-lines are- 64 bytesin length)
- Using - typedeffor the type declaration does not alter the semantics of- __attribute__ ((aligned (64)))(see code Example 2 below)
- I do not need to use - aligned_mallocwhen instantiating the struct if struct is declared with- __attribute__ ...
// Example 1
struct cache_line_aligned {
  int version;
  char padding[60];
} __attribute__ ((aligned (64)));
// Example 2
typedef struct {
  int version;  
  // place '__attribute__ ((aligned (64)))' after 'int version'
  // or at the end of the declaration 
  char padding[60];
} cache_line_aligned2 __attribute__ ((aligned (64)));
And finally a sketch of a function that uses the cache-line aligned approach to efficiently check if global state has been modified by some other thread:
void lazy_update_if_changed(int &t_version, char *t_data) {
  // Assuming 'g_cache_line_aligned' is an instance of 
  // 'struct cache_line_aligned' or 'struct cache_line_aligned2' 
  // and variables prefixed with 't_' being thread local 
  if(g_cache_line_aligned.version == t_version) {
    // do nothing and return
  } else {
    // enter critical section (acquire lock e.g. with pthread_mutex_lock) 
    t_version = g_cache_line_aligned.version
    // read other data that requires locking where changes are notified 
    // by modifying 'g_cache_line_aligned.version', e.g. t_data
    // leave critical section
  }
} 
Sorry for the long post.
Thank you!
 
     
    