Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Multiprocessor Systems: Understanding Processors, Shared Memory, and Cache Coherence, Summaries of Network Theory

Computer ArchitectureDistributed SystemsParallel ComputingOperating Systems

An in-depth analysis of multiprocessor systems, discussing the roles of central processing units (CPUs) and input-output processors (IOPs), the distinction between multiple computers and multiple processors, and the importance of shared memory and cache coherence. It also covers various interconnection structures, including time-shared common bus, multiport memory, and crossbar switch.

What you will learn

  • What are the advantages and disadvantages of multiport memory in a multiprocessor system?
  • What is shared memory in a multiprocessor system, and how does it work?
  • What are the roles of CPUs and IOPs in a multiprocessor system?
  • How does cache coherence ensure data consistency in a multiprocessor system?
  • What is the difference between a multiprocessor system and a system with multiple computers?

Typology: Summaries

2020/2021

Uploaded on 09/02/2021

umairmasood
umairmasood 🇵🇰

4 documents

1 / 39

Toggle sidebar

Related documents


Partial preview of the text

Download Multiprocessor Systems: Understanding Processors, Shared Memory, and Cache Coherence and more Summaries Network Theory in PDF only on Docsity! aracteristics oO Multiprocessors multiprocessor can mean either a central processing unit (CPU) or an input—output processor (IOP). ¢ As it is most commonly defined, a multiprocessor system implies the existence of multiple CPUs, although usually there will be one or more IOPs as well. ¢ Multiprocessors are classified as multiple instruction stream, multiple data stream systems. is organized. A multiprocessor system with common shared memory isclassifiedasa = &= = or. oraNNue) (eve MAAN BRAN ANA Avena aN In fact, most commercial tightly coupled multiprocessors providea = =~—~—~——~with each CPU. in addition, there is a global common memory that all CPUs can access. Information can therefore be shared among the CPUs by placing it in the (eese eerie Ss25, Each processor element in a loosely coupled system has its own private local memory. The processors are tied together by a switching scheme designed to route information from one processor to anotherthrougha —— passing.scheme, The processors relay program and data to other processors in packets. A packet consists of an , the , and The packets are addressed to a specific processor or taken by the first available processor, depending on the communication system used. Loosely coupled systems are most efficient when the interaction between tasks is minimal, whereas tightly coupled systems can tolerate a higher degree of interaction between tasks. memory unit that may be partitione separate modules. ¢ The interconnection between the components can have different physical configurations, depending on the number of transfer paths that are available between the processors and memory in a shared memory system or among the processing elements in a loosely coupled system. ¢ There are several physical forms available for establishing an interconnection network, Some of these schemes are presented in this section: its own local memory and to one or more processors. Each local bus may be connected to a CPU, an IOP, or any combination of processors. A system bus controller links each local bus to a common system bus. The I/O devices connected to the local IOP, as well as the local memory, are available to the local processor. The memory connected to the common system bus is shared by all processors. Only one processor can communicate with the shared memory and other common resources through the system bus at any given time. Local bus Local bus Local bus Figure 13-2 System bus structure for multiprocessors. ¢ The other processors are kept busy communicating with their local memory and I/O devices. processor bus consists of the address, data, and control lines required to communicate with memory. Memory modules MIM 1 MM 2 MIM 3 MM 4 cPU 1 | cPU2 cPUu3 CPU 4 J Figure 13-3 + Multiport memory Sfganisacion. + and four memory modules. The small square in each crosspoint is a switch that determines the path from a processor to a memory module. Each switch point has control logic to set up the transfer path between a processor and memory. It examines the address that is placed in the bus to determine whether its particular module is being addressed. Memory modules MM1 | ~ | | Mat M3 ws | CPU cPpud crus Figure 13-4 Crossbar switch. ¢ Priority levels are established by the arbitration logic to select one CPU when two or more CPUs attempt to access the same memory. Memory module Figure 13-5 Block diagram of crossbar switch. Data Address Reed/write Memory enable Multiplexers and arbitration logic WWE THE HI control from CPU | Data, address, and control fromCPU 2 | Data, address, and central from CPU 3 Data, address, and ) | Data, address, and | control from CPU 4 Lidia aii J A crossbar switch organization supports simultaneous transfers from all memory modules because there is a separate path associated with each module. However, the hardware required to implement the switch can become quite large and complex. Figure 13-6 Operation of a 2 x 2 interchange switch. 13-6, the 2 x 2 switch has two inputs, a 8 af aL labeled A and B, and two outputs, Aeneid ed scone labeled O and 1. ¢ There are control signals (not shown) : So associated with the switch that a — establish the interconnection akaieea 0 penmaea between the input and output terminals. The switch has the capability of connecting input A to either of the outputs. Terminal B of the switch; behaves ina similar fashion. The switch also has the capability to arbitrate between conflicting requests. ¢ If inputs A and B both request the same output terminal, only one of them will be connected; the other will be blocked. ¢Using the 2 x 2 switch as a building block, it is possible to build a multistage network to control the communication between a number of sources and destinations examines a different bit to determine the 2 x 2 switch setting. Level 1 inspects the most significant bit, level 2 inspects the middle bit, and level 3 inspects the least significant bit. eW_ hen the request arrives on either input of the 2 x 2 switch, it is routed a the upper output if the specified bit is O or to the lower output if the bit is 1. In tightly coupled multiprocessor system, the source is a processor and the destination is a memory module. *The first pass through the network sets up the path. *Succeeding passes are used to transfer the address into memory and then transfer the data in either direction, depending on whether the request is a read or a write. ¢ In aloosely coupled multiprocessor system, both the source and destination are processing elements. After the path is established, the source processor transfers a message to the destination processor. processors interconnected in an n-dimensional binary cube. Each processor forms a node of the cube. Although it is customary to refer to each node as having a processor, in effect it contains not only a CPU but also local memory and \/O interface. Each processor has direct communication paths-to n other neighbor processors. These paths correspond to the edges of the cube. There are 2’ n distinct n-bit binary addresses that can be assigned to the processors. Each processor address differs from that of each of its n neighbors by exactly one bit position. Figure 13.9 Hypercube seucrares fora = 1.13. ill] Ne no Ay e P H 7! ¢ A two-cube structure has n= 2 | and2°2=4. It contains four wf nodes interconnected as a ‘a! MO 4 Sq uare. One-cube Tworcube Three-cube eA three-cube structure has eight nodes interconnected as a cube. An n-cube structure has 2“ n nodes with a processor residing in each node. ¢ Each node is assigned a binary address in such a way that the addresses of two neighbors differ in exactly one bit position. ¢ For example, the three neighbors of the node with address 100 ina three-cube structure are0 00 ,110 ,and 1 01. Each of these binary numbers differs from address 100 by one bit value. The following example illustrates the use of a deterministic routing technique in a hypercube network. Example : Assume that S =S5S4 .. . S1S0 to be the source node address, and that D = D5D4 ...D1D0 is the destination node address in a six-dimensional hypercube These pathes have to follw —=> >) 0 23 5 The order in which these dimensions are traversed is not important. Let us assume that the message will follow the route by traversing the following dimensions 5, 3, 2, and 0. Then the route is totally determined as" —42(101010)—34(100010)—-38(100110) — 1 T T ¢ The primary advantage of cache is its ability to reduce the average access time in uniprocessors. When the processor finds a word in cache during a read operation, the main memory is not involved in the transfer. If the operation is to write, there are two commonly used procedures to update memory. ¢ In the write-through policy, both cache and main memory are updated with every write operation. ¢ In the write-back policy, only the cache is updated and the location is marked so that it can be copied later into main memory. The same information may reside in a number of copies in some caches and main memory. To ensure the ability of the system to execute memory operations correctly, the multiple copies must be kept identical. This requirement imposes a cache.coherence problem. A memory scheme is Without a proper solution to the cache coherence problem, caching cannot be used in bus-oriented multiprocessors with two or more processors. Figure 13-13 Cache configuration after a store ro X by processor Pi. { X= 120 | Main memory { Pr | Ps ‘| Processors @) With write -through cache policy ~ (Sae ieael | Sa | eecne @) With write-back cache policy * Another configuration that may cause consistency problemsisa memory ~ (__-) activity in conjunction with an TOP connected to the system bus. * In the case of input, the DMA may modify locations in main memory that also reside in cache without updating the cache. * During a DMA output, memory locations may be read before they are updated from the cache when using a write-back policy. 1/0-based memory incoherence can he overcome by making the IOP a participant in the cache coherent solution that is adopted in the system. Every data access is made to the shared cache. This method violates the principle of closeness of Cpu to cache and increases the average memory access time. In effect, this scheme solves the problem by avoiding it. For performance considerations it is desirable to attach a private cache to each processor. One scheme that has been used allows only nonshared and read-only data to be stored in caches. Such items are called Cachable. Shared writable dataare = © The compiler must tag data as either cachable or noncachable, and the system hardware makes sure that only cachable data are stored in caches. The noncachable data remain in main memory. A scheme that allows |= © = «= = to exist in at leastonecacheisamethodthatemploysa = © in its compiler. The status of memory blocks is stored in the central global table. Each block is identified as All caches can have copies of blocks identified as RO. Only one cache can have a copy of an RW block. Thus if the data are updated in the cache with RW block, the other caches are not affected because they do not have a copy of this block. The cache controller constantly watches the bus. Write Invalidate When a processor writes into C, all copies of it in other processors are invalidated. These processors have to read a valid copy either from M, or from the processor that modified the variable. Write Broadcast Instead of invalidating, why not broadcast the updated value to the other processors sharing that copy? This will act as write through for shared data, and write back for private data. Write broadcast consumes more bus bandwidth compared to write invalidate. Why? ot vali SHARED Multiple caches may hold valid copies. EXCLUSIVE No other cache has this block, M-block is valid MODIFIED Valid block, but copy in M-block is not valid. Event Local Remote Read hit Use local copy No action Read miss || toS,orltoE (S,E,M) to S Write hit | (S,E) toM (S,E,M) to | Write miss || to M (S,E,M) to | When a cache block changes its status from M, it first updates the main memory. x=5 x=5 x=5 x=5 $s Ss $s $s 5) be] oes] es) Po Food g 5) bea] es) bes)
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved