It seems to me that these days lots of calculations are done on the GPU. Obviously graphics are done there, but using CUDA and the like, AI, hashing algorithms (think bitcoins) and others are also done on the GPU. Why can't we just get rid of the CPU and use the GPU on its own? What makes the GPU so much faster than the CPU?
15 Answers
TL;DR answer: GPUs have far more processor cores than CPUs, but because each GPU core runs significantly slower than a CPU core and do not have the features needed for modern operating systems, they are not appropriate for performing most of the processing in everyday computing. They are most suited to compute-intensive operations such as video processing and physics simulations.
GPGPU is still a relatively new concept. GPUs were initially used for rendering graphics only; as technology advanced, the large number of cores in GPUs relative to CPUs was exploited by developing computational capabilities for GPUs so that they can process many parallel streams of data simultaneously, no matter what that data may be. While GPUs can have hundreds or even thousands of stream processors, they each run slower than a CPU core and have fewer features (even if they are Turing complete and can be programmed to run any program a CPU can run). Features missing from GPUs include interrupts and virtual memory, which are required to implement a modern operating system.
In other words, CPUs and GPUs have significantly different architectures that make them better suited to different tasks. A GPU can handle large amounts of data in many streams, performing relatively simple operations on them, but is ill-suited to heavy or complex processing on a single or few streams of data. A CPU is much faster on a per-core basis (in terms of instructions per second) and can perform complex operations on a single or few streams of data more easily, but cannot efficiently handle many streams simultaneously.
As a result, GPUs are not suited to handle tasks that do not significantly benefit from or cannot be parallelized, including many common consumer applications such as word processors. Furthermore, GPUs use a fundamentally different architecture; one would have to program an application specifically for a GPU for it to work, and significantly different techniques are required to program GPUs. These different techniques include new programming languages, modifications to existing languages, and new programming paradigms that are better suited to expressing a computation as a parallel operation to be performed by many stream processors. For more information on the techniques needed to program GPUs, see the Wikipedia articles on stream processing and parallel computing.
Modern GPUs are capable of performing vector operations and floating-point arithmetic, with the latest cards capable of manipulating double-precision floating-point numbers. Frameworks such as CUDA and OpenCL enable programs to be written for GPUs, and the nature of GPUs make them most suited to highly parallelizable operations, such as in scientific computing, where a series of specialized GPU compute cards can be a viable replacement for a small compute cluster as in NVIDIA Tesla Personal Supercomputers. Consumers with modern GPUs who are experienced with Folding@home can use them to contribute with GPU clients, which can perform protein folding simulations at very high speeds and contribute more work to the project (be sure to read the FAQs first, especially those related to GPUs). GPUs can also enable better physics simulation in video games using PhysX, accelerate video encoding and decoding, and perform other compute-intensive tasks. It is these types of tasks that GPUs are most suited to performing.
AMD is pioneering a processor design called the Accelerated Processing Unit (APU) which combines conventional x86 CPU cores with GPUs. This approach enables graphical performance vastly superior to motherboard-integrated graphics solutions (though no match for more expensive discrete GPUs), and allows for a compact, low-cost system with good multimedia performance without the need for a separate GPU. The latest Intel processors also offer on-chip integrated graphics, although competitive integrated GPU performance is currently limited to the few chips with Intel Iris Pro Graphics. As technology continues to advance, we will see an increasing degree of convergence of these once-separate parts. AMD envisions a future where the CPU and GPU are one, capable of seamlessly working together on the same task.
Nonetheless, many tasks performed by PC operating systems and applications are still better suited to CPUs, and much work is needed to accelerate a program using a GPU. Since so much existing software use the x86 architecture, and because GPUs require different programming techniques and are missing several important features needed for operating systems, a general transition from CPU to GPU for everyday computing is very difficult.
- 46,683
What makes the GPU so much faster than the CPU?
The GPU is not faster than the CPU. CPU and GPU are designed with two different goals, with different trade-offs, so they have different performance characteristic. Certain tasks are faster in a CPU while other tasks are faster computed in a GPU. The CPU excels at doing complex manipulations to a small set of data, the GPU excels at doing simple manipulations to a large set of data.
The GPU is a special-purpose CPU, designed so that a single instruction works over a large block of data (SIMD/Single Instruction Multiple Data), all of them applying the same operation. Working in blocks of data is certainly more efficient than working with a single cell at a time because there is a much reduced overhead in decoding the instructions, however working in large blocks means there are more parallel working units, so it uses much much more transistors to implement a single GPU instruction (causing physical size constraint, using more energy, and producing more heat).
The CPU is designed to execute a single instruction on a single datum as quickly as possible. Since it only need to work with a single datum, the number of transistors that is required to implement a single instruction is much less so a CPU can afford to have a larger instruction set, a more complex ALU, a better branch prediction, better virtualized architecture, and a more sophisticated caching/pipeline schemes. Its instruction cycles is also faster.
The reason why we are still using CPU is not because x86 is the king of CPU architecture and Windows is written for x86, the reason why we are still using CPU is because the kind of tasks that an OS needs to do, i.e. making decisions, is run more efficiently on a CPU architecture. An OS needs to look at 100s of different types of data and make various decisions which all depends on each other; this kind of job does not easily parallelizes, at least not into an SIMD architecture.
In the future, what we will see is a convergence between the CPU and GPU architecture as CPU acquires the capability to work over blocks of data, e.g. SSE. Also, as manufacturing technology improves and chips gets smaller, the GPU can afford to implement more complex instructions.
- 4,517
- 3
- 25
- 27
GPUs lack:
- Virtual memory (!!!)
- Means of addressing devices other than memory (e.g. keyboards, printers, secondary storage, etc)
- Interrupts
You need these to be able to implement anything like a modern operating system.
They are also (relatively) slow at double precision arithmetic (when compared with their single precision arithmetic performance)*, and are much larger (in terms of size of silicon). Older GPU architectures don't support indirect calls (through function pointers) needed for most general-purpose programming, and more recent architectures that do do so slowly. Finally, (as other answers have noted), for tasks which cannot be parallelized, GPUs lose in comparison to CPUs given the same workload.
EDIT: Please note that this response was written in 2011 -- GPU tech is an area changing constantly. Things could be very different depending on when you're reading this :P
* Some GPUs aren't slow at double precision arithmetic, such as NVidia's Quadro or Tesla lines (Fermi generation or newer), or AMD's FirePro line (GCN generation or newer). But these aren't in most consumers' machines.
- 8,421
A CPU is like a worker that goes super fast. A GPU is like a group of clone workers that go fast, but which all have to do exactly the same thing in unison (with the exception that you can have some clones sit idle if you want)
Which would you rather have as your fellow developer, one super fast guy, or 100 fast clones that are not actually as fast, but all have to perform the same actions simultaneously?
For some actions, the clones are pretty good e.g. sweep the floor - they can each sweep a part of it.
For some actions, the clones stink, e.g. write the weekly report - all the clones but one sit idle while one clone writes the report (otherwise you just get 100 copies of the same report).
- 1,169
Because GPUs are designed to do a lot of small things at once, and CPUs are designed to do a one thing at a time. If your process can be made massively parallel, like hashing, the GPU is orders of magnitude faster, otherwise it won't be.
Your CPU can compute a hash much, much faster than your GPU can - but the time it takes your CPU to do it, your GPU could be part way through several hundred hashes. GPUs are designed to do a lot of things at the same time, and CPUs are designed to do one thing at a time, but very fast.
The problem is that CPUs and GPUs are very different solutions to very different problems, there is a little overlap but generally what's in their domain stays in their domain. We can't replace the CPU with a GPU because the CPU is sitting there doing its job much better than a GPU ever could, simply because a GPU isn't designed to do the job, and a CPU is.
A minor side note, though, if it were possible to scrap the CPU and only have a GPU, don't you think we'd rename it? :)
- 23,483
Are you really asking why are we not using GPU like architectures in CPU?
GPU is just a specialized CPU of a graphics card. We lend GPU non graphics computation because general purpose CPU are just not up to par in parallel and floating point execution.
We actually are using different (more GPU-ish) CPU architectures. E.g. Niagara processors are quite multitasking. SPARC T3 will run 512 concurrent threads.
- 542
I might be horribly mistaken here, and am speaking from little or no authority on the subject, but here goes:
I believe each GPU execution units ("core") have a very limited address space compared to a CPU.
GPU execution units can't deal with branching efficiently.
GPU execution units don't support hardware interrupts in the same way CPUs do.
I've always thought the way GPU execution units were meant to be is something like the Playstation 3 "SPEs", they want to be given a block of data, run a number of sequential operations on it, and then spit out another block of data, rinse, repeat. They do not have as much addressable memory as the main "CPE" but the idea is to dedicate each "SPE" to a specific, sequential task. The output of one unit might feed the input of another unit.
The execution units don't work well if they are trying to "analyze" the data and make a bunch of decisions based on what that data is.
These "blocks of data" can be part of a stream, such as a list of vertices from a game's state table, MPEG data from a disk, etc.
If something does not fit this "streaming" model then you have a task which cannot be efficiently paralellized and the GPU is not necessarily the best solution for it. A good example is processing "external event" based things like keyboard, joystick, or network input. There aren't a lot of things that don't fit that model, but there will always be a few.
- 75,182
This is nothing about clock speed or purpose. They are both equally able to complete most, if not all tasks; however some are slightly better suited for some tasks then others.
There has been a very old argument about whether it's better to have lots of dumb cores or a small group of very smart cores. This goes back easily into the 80's.
Inside a CPU there are many possible calculations that can be done. The smarter cores are able to carry out many different calculations at the same time (kind of like multi-core but not, it's complicated; see Instruction-level parallelism). A smart core could do several calculations at the same time (add, subtract, multiply, divide, memory operation) but only one at a time; because of this, they are physically larger (and therefore much more expensive) then dumber cores.
A dumb core is much smaller and therefore more can be added to a single chip but are not able to do as many simultaneous calculations. There is a fine balance between many dumb cores and a few smart cores.
Multi-core architectures work well with graphics because the calculations can easily be split up over hundreds of cores, but it is also dependent on the quality of code and whether other code is relying on the result of one calculation.
This is a much more complicated question than it may appear. For more info, read this article about CPU design:
Modern Microprocessors - A 90 Minute guide
- 191
I would like to broach one Syntactic point: The terms CPU and GPU are functional names not architectural names.
If a computer were to use a GPU as its main processor, it would then would become a "central processing unit" (CPU) regardless of the architectural and design.
- 349
It is important to keep in mind that there is no magical dividing line in the architecture space that makes one processor the "central" one and another the "graphics" one. (Well, some GPUs may be too crippled to be fully general, but those are not the ones we are are talking about here.)
The distinction is one of how they are installed on the board and what tasks are given to them. Of course, we use a general-purpose processors (or set of general-purpose processors) for the main data mover, and a special, parallelized, deeply pipe-lined unit for things (like graphics) to can best take advantage of them.
Most of the spiffy tricks that have been used to make GPUs do their thing very fast were first developed by people trying to make faster and better CPUs. In turns out that Word and Excel and Netscape and many other things that people use their computers for not only don't take full advantage of the features offered by graphics specialized chips but even run slower on those architectures because branch a lot cause (very expensive and slow) pipe-line clears.
The whole point of there being a GPU at all was to relief the CPU from expensive graphics calculations that it was doing at the time.
By combining them to a single processor again would be going back to where all started.
- 3,931
For a simple reason: most applications are not multi-threaded/vectorized.
Graphic cards heavily rely on multi threading, at least in the concept.
Compare a car with a single engine, an a car with one smaller engine per wheel. With the latter car, you need to command all the engines, something which has not been taken into account for a system programming point of view.
With AMD fusion though, it will change how we will need to make use of the processing power: either vectorized, either fast for one thread.
- 449
The reason we are still using CPUs is that both CPUs and GPUs have their unique advantages. See my following paper, accepted in ACM Computing Surveys 2015, which provides conclusive and comprehensive discussion on moving away from 'CPU vs GPU debate' to 'CPU-GPU collaborative computing'.
- 201
If to put simply GPU can be compared to trailer in the car. As usually trunk is enough for majority of people except for cases if they buy something really big. Then they can need trailer. The same with GPU, as usually it is enough to have ordinary CPU which will accomplish majority of tasks. But if you need some intensive calculations in many threads, then you can need GPU
- 227
gpus are good stream processors. you can think of stream processing as multiplying a long array of numbers sequentially. cpus also have stream processing capabilities (it's called SIMD extensions) but you can't implement all programming logic as stream processing, and compilers have the option to create btyecode which meakes use of simd instructions whenever possible.
not everything is an array of numbers. pictures and videos are, maybe sound too(there are opencl encoders here and there). so gpus can process, encode and decode pictures, videos and anything similar. one drawback is that you can't offload everything to gpus in games because it would create stutter, gpus are busy with graphics and are supposed to be the bottleneck in the system when playing games. the optimal solution would be fully utilizing all components in a pc. so, for example, nvidia's physx engine, by default, does calculations on the cpu when the gpu is fully utilized.
- 1,444