66

I have read that manufacturers stopped concentrating on higher clock speeds and are now working on other things to improve performance.

With

  • an old Desktop machine with Intel® Xeon® Processor E3110 with clock speed of 3.0GHz
  • and a new server with AMD Opteron(TM) Processor 6272 with clock speed of 2.1GHz

when performed a simple encryption comparison using (single threaded)

 openssl aes256c

the desktop performed far better than the server.

So even with latest optimization, why does the processor with the better clock speed perform better?

Breakthrough
  • 34,847
learner
  • 551

9 Answers9

75

The reason manufacturers have stopped concentrating on increasing clock speed is because we can no longer cool the processors fast enough for this to be viable. The higher the clock speed, the more heat is generated, and we've now hit a stage where it is no longer efficient to increase processor speed due to the amount of energy that goes into cooling it.

Other answer goes into detail on how higher clock speed doesn't mean better performance in all areas.

Paul Hay
  • 581
38

There is a lot more to processing speed than the clock rate.

  • Different CPUs can do different amounts in the same number of clock cycles, due to different variants on pipeline arrangement and having multiple component units (adders and so forth) in each core. While in your test it is not the case, you often find a "slower" chip can do more than a fast ones (measured by clock rate only) due to being able to do more per tick.

  • The test you performed may be very sensitive to differences in CPU architecture: it could be optimised for a specific architecture, you might find it performs differently not just between Intel chips and AMD ones but between Intel (or AMD) chips of different families. It is likely using a single thread too so is not taking advantage of the CPUs' multiple cores.

  • There is a move to lower clock rates for power and heat management reasons: ramping up the clock rate does not have a linear effect on power use and heat output.

  • Because of the above non-linear relationship it is far more efficient for today's requirements to have multiple processing units than it is to push the speed of one unit ever higher. This also allows for clever tricks to conserve power like turning off individual cores when not in use and revving them back up as demand increases again. Of course multiple cores doesn't help a single-threaded algorithm of course, though it would if you ran two or more instances of it at the same time.

19

Why do you think the manufactures are actually lowering the clock speed by only comparing two processors?

  1. The 6272 has a Turbo Speed of 3Ghz. The lower base speed is just for lowering average wattage and keeping a acceptable TDP for a workloard when all cores are stressed.
  2. AMD's next high performance chip for desktop the FX-9590 will hit 5 Ghz.

Also clock-speed isn't the same as performance per clock-cycle. You can have a 3.8 Ghz P4 vs. one 3.2 Ghz core from a i7-3930K, but that doesn't mean the P4 core is faster.

Everything said here about power consumption is also perfectly valid and true for a 16 core design, where you naturally got to be more concerned about TDP issues.

Also your benchmark method just testing openssl is a bit to simple to give real world numbers. Maybe you should try any crypto benchmark suite.

s1lv3r
  • 293
13

Your test case (aes-256 encryption) is very sensitive to processor-specific optimizations.

There are various CPUs that have special instructions intended to speed up encryption/decryption operations. Not only may these special instructions be only present on your desktop - it might be that the AMD CPU has different special instructions. Also, openssl might support these special instructions only for the Intel CPU. Did you check whether that was the case?

To find out which system is faster, try using a "proper" benchmark suite - or better, just use your typical workload.

jakob
  • 241
11

As others have said, we can no longer effectively cool CPUs if we were to push the voltage required for the same relative clock rate increases in the past. There was a time (P4 era and prior) when you could purchase a new CPU and see an "immediate" gain is speed because the clock rate was significantly increased compared to the previous generation. Now we have hit a thermal wall, of sorts.

Each new modern generation of processors is very slightly increasing in clock rate, but this is also relative to the ability to cool them appropriately. Chip makers, such as Intel, are continually focusing on shrinking the die size of the CPU to both make them more power efficient and produce less heat at the same clocks. As a side note, this shrinking die size makes these makes modern processors more prone to die from over-volting rather than overheating. This means that it is also limiting the ceiling clock rate of any current generation CPU without other optimizations made by the chip maker.

Another area that is being heavily focused on by chip makers is increasing the number of cores on chip. This does factor into significant increases in computational power, but only when using software that takes advantage of multiple cores. Note the difference between computational power and speed here. Simply put, speed refers to how quickly a computer can execute a single instruction, whereas computational power refers to how many computations a computer can make in a given amount of time. Modern day operation systems, and much modern software does take advantage of multiple cores. The problem is that concurrent/parallel programming is more difficult than the standard, linear programming paradigm. This increased the time it took for many programs on the market to take full advantage of these newer processors power because many developers were not used to writing programs this way. There are still some programs on the market today (either modern or legacy) that do not take advantage of multiple cores or multi-threading. The encryption program that you cited is one such example.

These two areas of focus by chip makers are intrinsically connected. By reducing both the die size and power consumption of a chip, they are then able to increase the number of cores on said chip. Eventually though, this too will hit a wall, causing another, more drastic, paradigm shift.

The reason for this paradigm shift is due to us coming close to the limits of silicon as a base material for chip production. This is something that Intel and others have been working on solving for some time. Intel has stated that it has an alternative to silicon in the works, and we will likely start seeing it sometime after 2017. In addition to this new material, Intel is also looking into 3D transistors that could "effectively triple the processing power". Here is an article mentioning both of these ideas: http://apcmag.com/intel-looks-beyond-silicon-for-processors-past-2017.htm

10

Simple: The AMD chip is far, far faster because it is a 16 core chip. At 115 Watt, it means each core produces ~7 Watt. This would not be achievable if each core ran at 3 Ghz. To achieve that 7 Watt figure, AMD lowered the clock frequency. Lowering the clock frequency by 10% reduces power consumption by 20%, which in turn allows you to put 25% extra cores on a chip.

MSalters
  • 8,283
2
  • The heat losses H equal 4th degree of frequency f.

    H ~ f^4

    So, the minor increasing of frequency leads to high heat losses.

  • Farther miniaturization

    Higher frequency leads to farther crystal minimization. At this moment we have no technologies to work effectively with nano-meter scale materials and nano-meters are the limit.

Warlock
  • 129
2

As stated in a few other answers, CPU manufacturers want to keep clock speeds down to control power consumption and heat dissipation. In order to do more work at the same clock speed, several strategies are used.

Large on-chip memory caches can keep more data "close to" the CPU, available to be processed with minimal delay, as opposed to main memory, which is much slower to deliver data to the CPU.

Different CPU instructions take differing numbers of clock cycles to complete. In many cases, you can use a simple circuit to implement an operation over several clock cycles, or a more complex circuit to do so in fewer.

The most dramatic example of this in the Intel evolution is in the Pentium 4, which was a big outlier in clock speed, but didn't perform proportionally well. The bit-shifting instructions, which in previous chips could shift 32 bits in a single cycle, used a much simpler circuit in the Pentium 4, which required a single cycle for each bit shift. The expectation was that the Pentium 4 architecture would be scalable to much higher clock speeds because of its simplicity, but that didn't work out, and the fast, complex shift circuit returned in the Core and later architectures.

2

From IEEE:

So why not push the clock faster? Because it's no longer worth the cost in terms of power consumed and heat dissipated. Intel calls the speed/power ­tradeoff a ”fundamental theorem of multicore processors”—and that's the reason it makes sense to use two or more processing areas, or cores, on a single chip.

http://spectrum.ieee.org/computing/hardware/why-cpu-frequency-stalled

Azevedo
  • 588