
According to AbbreviationFinder.org, TLP stands for Transmission Level Point.
TLP: simultaneous execution of threads
Another commonly used strategy to increase CPU parallelism is to include the ability to run multiple threads (programs) at the same time. In general, high TLP CPUs have been in use much longer than high ILP ones. Many of the designs that Seymour Cray pioneered during the late 1970s and 1980s focused on the TLP as their primary method of facilitating enormous computing capabilities (for its time). In fact, the TLP, in the form of multithreaded enhancements, was in use as early as the 1950s.
In the context of individual processors design, the two main methods used to achieve the TLP are, Multiprocessing chip – level, chip-level English multiprocessing (CMP), and multithreading simultaneous, in simultaneous multithreading English (SMT). At a high level, it is very common to build computers with multiple completely independent CPU arrangements as Symmetric Multiprocessing (symmetric multiprocessing (SMP)) and access nonuniform memory (Non-Uniform Memory Access (NUMA)). Although very different means are used, all of these techniques achieve the same goal: increasing the number of threads that the CPU (s) can run in parallel.
The CMP and SMP parallelism methods are similar to each other and most straightforward. These involve something more conceptual than using two or more full CPUs and separate CPUs. In the case of the CMP, multiple processor “cores” are included in the same package, sometimes in the same integrated circuit.
On the other hand, the SMP includes multiple independent packages. NUMA is somewhat similar to SMP but uses a non-uniform memory access model. This is important for CPU-intensive computers because each processor’s memory access time is quickly exhausted with the SMP shared memory model, resulting in significant lag due to CPUs waiting for memory. Therefore, NUMA is considered a much more scalable model, successfully allowing many more CPUs to be used in a computer than the SMP can feasibly support. SMT differs somewhat from other TLP enhancements in that the former attempts to duplicate as few portions of the CPU as possible.
While considered a TLP strategy, its implementation actually resembles more of a superscalar design, and is in fact frequently used in superscalar microprocessors, such as IBM’s POWER5. Rather than duplicating the entire CPU, SMT designs only duplicate the pieces needed for reading, decoding, and dispatching instructions, as well as things like general-purpose registers. This allows an SMT CPU to keep its execution units busy more frequently by providing instructions from two different software threads. Again this is very similar to the ILP superscalar method, but it executes multi-threaded instructions simultaneously rather than concurrently executing multiple instructions from the same thread.
Vector processors and the SIMD
A less common but increasingly important CPU (and indeed computing in general) paradigm deals with vectors. The processors discussed above are all referred to as some kind of scalar device. As their name implies, vector processors deal with multiple pieces of data in the context of an instruction, this is in contrast to scalar processors, which handle one piece of data for each instruction.
These two schemes of dealing with data are generally referred to respectively as SISD (Single Instruction, Single Data |) (Simple Instruction, Simple Data) and SIMD (Single Instruction, Multiple Data) (Simple Instruction, Multiple Data). The great utility in creating CPUs that deal with data vectors lies in the optimization of tasks that tend to require the same operation, for example, a sum, or a Scalar Product, to be performed on a large data set. Some classic examples of this type of task are Multimedia applications (images, video, and sound), as well as many types of scientific and engineering tasks.
While a scalar CPU must complete the entire process of reading, decoding, and executing every instruction and value in a data set, a vector CPU can perform a simple operation on a comparatively large data set with a single instruction. Of course, this is only possible when the application tends to require many steps that apply one operation to a large set of data.
Most early vector CPUs, like the Cray-1, were associated almost exclusively with cryptography and scientific research applications. However, as multimedia has largely moved to digital media, the need for some form of SIMD in general-purpose CPUs has become significant. Shortly after it became common to include floating point units in general-purpose processors, specifications and implementations of SIMD execution units for general-purpose CPUs also began to appear. Some of these early SIMD specifications, like Intel’s MMX, were for whole numbers only.
This proved to be a significant handicap for some software developers, as many of the applications that benefited from SIMD dealt primarily with floating point numbers. Progressively, these early designs were refined and remade into some of the common, modern SIMD specifications, which are generally associated with an ISA. Some notable modern examples are Intel’s SSE and the PowerPC-related AltiVec (also known as VMX).