Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Search for study opportunitiesNEW

Connect with the world's best universities and choose your course of study

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Introduction to computer architecture, Cheat Sheet of Advanced Computer Architecture

Daystar University Advanced Computer Architecture

Prof. Joseph Okayeso

Functions of computer components.

Typology: Cheat Sheet

2022/2023

Uploaded on 11/10/2023

joseph-okatech 🇰🇪

1 document

1 / 65

Partial preview of the text

Download Introduction to computer architecture and more Cheat Sheet Advanced Computer Architecture in PDF only on Docsity! 1 COMPUTER ARCHITECTURE FOURTH YEAR ELECTRONIC 2016-2017 BY Assist. Prof. Dr. EYAD I. ABBAS REPUBLIC OF IRAQ Gal Ayspen MINISTRY OF HIGHER EDUCATION ie ol AND SCIENTIFIC RESEARCH Exp» opt canal y hall pall), STUDIES & PLANNING & FOLLOW UPDIRECTORATE SRL SOLAN atts Aa.tial) 5 abaSilly Ol) ll § yl), STUDIES & PLANNING DEPARTMENT Sao5)) y Chu! 5) 1] Ni 305198 VN VAN NT dant sl alld (ate Lad oat gail elie THIN NTH TON elt oh yh styl al ote yh etal el ga Ct elon verweseevve oh at pl st lly tally SY ps ed Sy Se pel oe eye! Very/t/saqvle ah Ag! Ul telve TOAV/AVHT/NG oy PA tye eal oa AA Sy patel Sagiglt Aen! sets | eta yeawinirrea/t oy Sly gt pe Analy | AAU J yt fad 2g | Peat pla ge pl taaly | oe reyy/ysy Aghestll Wes 6 ee ee Tesv/Ale vate V/V/N a ey 1 | Pell ply Loi Se | weyV/sis Ph gyal See! ce 13) cel platy Aad Alba oye SUEY eal Jail Q atl! ays gcd SS aed aud amy Ut ts JS) etd! ces con ggtlall Comlly Thal) aula 3515 9) neusl! Armiual! yeah www.facebook.com/moheiq 90VI ee Sal 5 ll at) Bs gow.ig Rafaed 5 deplebatt y Stet att be studiesdept@mohess gov.ig peat SAD ad 5 Computer Organization Computer organization is concerned with the way the hardware components operate and the way they are connected together to form the computer system. The various components are assumed to be in place and the task is to investigate the organizational structure to verify that the computer parts operate as intended. Computer Design Computer design is concerned with the hardware design of the computer. Once the computer specifications are formulated, it is the task of the designer to develop hardware for the system. Computer design is concerned with the determination of what hardware should be used and how the parts should be connected. This aspect of computer hardware is sometimes referred to as computer implementation. Computer Architecture Computer architecture is concerned with the structure and behavior of the computer as seen by the user. It includes the information formats, the instruction set, and techniques for addressing memory. The architectural design of a computer system is concerned with the specifications of the various functional modules, such as processors and memories, and structuring them together into a computer system. Instruction Set Architecture 1- Opcodes: Consists of : operate instructions: as logical and arithmetical instructions Data movement instructions Control instructions 2- Data types:consists of 8, 16, 32 , and 64 bits 3- Addressing modes: consists of - operands specified - next instruction to execute is specified - Architecture-specific - An instruction can use several addressing modes Register Transfer Language Digital systems vary in size and complexity from a few integrated circuits to a complex of interconnected and interacting digital computers. Digital system design invariably uses a modular approach. The modules are constructed from such digital components as registers, decoders, arithmetic elements, and control logic. The various modules are interconnected with common data and control paths to form a digital computer system. Digital modules are best defined by the registers they contain and the operations that are performed on the data stored in them. The operations executed on data stored in registers are called 6 microoperations. A microoperation is an elementary operation performed on the information stored in one or more registers. The result of the operation may replace the previous binary information of a register or may be transferred to another register. Examples of microoperations are shift, count, dear, and load. The internal hardware organization of a digital computer is best defined by specifying: 1- The set of registers it contains and their function. 2- The sequence of microoperations performed on the binary information stored in the registers. 3- The control that initiates the sequence of microoperations. The symbolic notation used to describe the microoperation transfers among registers is called a register transfer language. The term "register transfer" implies the availability of hardware logic circuits that can perform a stated microoperation and transfer the result of the operation to the same or another register. The word "language" is borrowed from programmers, who apply this term to programming languages. A register transfer language is a system for expressing in symbolic form the microoperation sequences among the registers of a digital module. Register Transfer Computer registers are designated by capital letters (sometimes followed by numerals) to denote the function of the register. For example: MAR: memory address register PC: program counter IR: instruction register R1: processor register The representation of registers in block diagram form is shown in Fig(2): a- Rectangular box with the name of the register inside. b- The individual bits. c- The numbering of bits in a 16-bit register can be marked on top of the box. d- 16-bit register is partitioned into two parts. Bits 0 through 7 are assigned the symbol L (for low byte) and bits 8 through 15 are assigned the symbol H (for high byte). The 7 name of the 16-bit register is PC. The symbol PC(0-7) or PC(L) refers to the low-order byte and PC(8-15) or PC(H) to the high-order byte. Information transfer from one register to another is designated in symbolic form by means of a replacement operator. The statement: 𝑅2 ← 𝑅1 Denotes a transfer of the content of register Rl into register R2. It designates a replacement of the content of R2 by the content of Rl. By definition, the content of the source register Rl does not change after the transfer. If we want the transfer to occur only under a predetermined control condition. This can be shown by means of an if-then statement. 𝐼𝑓(𝑃 = 1)𝑡ℎ𝑒𝑛(𝑅2 ← 𝑅1) where P is a control signal generated in the control section. It is sometimes convenient to separate the control variables from the register transfer operation control function by specifying a control function. 𝑃: 𝑅2 ← 𝑅1 The control condition is terminated with a colon. It symbolizes the requirement that the transfer operation be executed by the hardware only if P = 1. To separate two or more operations that is executed at the same time by using the comma as the statement: 𝑇: 𝑅2 ← 𝑅1, 𝑅5 ← 𝑅3 The basic symbols of the register transfer notation are listed in Table (1) Registers are denoted by capital letters, and numerals may follow the letters. Parentheses are used to denote a part of a register by specifying the range of bits or by giving a symbol name to a portion of a register. 10 Arithmetic Microoperations The arithmetic operations are listed in the Table(3): The multiply and divide are not listed in Table(3), these two operations are valid arithmetic operations but are not included in the basic set of microoperations. In most computers, the multiplication operation is implemented with a sequence of add and shift microoperations. Division is implemented with a sequence of subtract and shift microoperations. The digital circuit that generates the arithmetic sum of two binary numbers of any length is called a binary adder as shown in Fig(5). The addition and subtraction operations can be combined into one common circuit by including an exclusive-OR gate with each full-adder as shown in Fig(6). The increment microoperation adds one to a number in a register. For example, if a 4-bit register has a binary value 0110, it will go to 0111 after it is incremented. The diagram of a 4- bit combinational circuit incrementer is shown in Fig(7): 11 The arithmetic microoperations listed in the Table above can be implemented in one composite arithmetic circuit. The basic component of an arithmetic circuit is the parallel adder. By controlling the data inputs to the adder, it is possible to obtain different types of arithmetic operations as shown in Fig(8). It is possible to generate the eight arithmetic microoperations listed in Table(4): 12 Logic Microoperations Logic microoperations specify binary operations for strings of bits stored in registers. These operations consider each bit of the register separately and treat them as binary variables. For example, the exclusive-OR microoperation with the contents of two registers Rl and R2 is symbolized by the statement: 𝑃: 𝑅1 ← 𝑅1 ⊕ 𝑅2 It specifies a logic microoperation to be executed on the individual bits of the registers provided that the control variable P = 1. As a numerical example, assume that each register has four bits. Let the content of Rl be 1010 and the content of R2 be 1100. The exclusive-OR microoperation stated above symbolizes the following logic computation: There are 16 different logic operations that can be performed with two binary variables. They can be determined from all possible truth tables obtained with two binary variables as shown in Table(5): The 16 Boolean functions of two variables x and y are expressed in algebraic form in the first column of Table(6): 15 Arithmetic Logic Shift Unit Computer systems employ a number of storage registers connected to a common operational unit called an arithmetic logic unit, abbreviated ALU. The arithmetic, logic, and shift circuits introduced in previous sections can be combined into one ALU with common selection variables. One stage of an arithmetic logic shift unit is shown in Fig(11) with the functional table(8): 16 Instruction Codes A computer instruction is a binary code that specifies a sequence of microoperations for the computer. Instruction codes together with data are stored in memory. The computer reads each instruction from memory and places it in a control register. The control then interprets the binary code of the instruction and proceeds to execute it by issuing a sequence of microoperations. Every computer has its own unique instruction set. The ability to store and execute instructions, the stored program concept, is the most important property of a general- purpose computer. An instruction code is a group of bits that instruct the computer to perform a specific operation. It is usually divided into parts, each having its own particular interpretation. The most basic part of an instruction code is its operation part. The operation code of an instruction is a group of bits that define such operations as add, subtract, multiply, shift, and complement. The number of bits required for the operation code of an instruction depends on the total number of operations available in the computer. The simplest way to organize a computer is to have one processor register and an instruction code format with two parts. The first part specifies the operation to be performed and the second specifies an address. The memory address tells the control where to find an operand in memory. This operand is read from memory and used as the data to be operated on together with the data stored in the processor register. For a memory unit with 4096 words we need 12 bits to specify an address since 212= 4096. If we store each instruction code in one 16-bit memory word, we have available four bits for the operation code (abbreviated opcode) to specify one out of 16 possible operations, and 12 bits to specify the address of an operand. The control reads a 16-bit instruction from the program portion of memory. It uses the 12-bit address part of the instruction to read a 16-bit operand from the data portion of memory. Fig(12) depicts this type of organization. Computers that have a single-processor register usually assign to it the name accumulator and label it AC. 17 For example, operations such as clear AC, complement AC, and increment AC operate on data stored in the AC register. When the second part of an instruction immediate code specifies an operand, this type is called immediate operand. When the second part specifies the address of an operand, the instruction is said to have a direct address. The third possibility called indirect address, where the bits in the second part of the instruction designate an address of a memory word in which the address of the operand is found. One bit of the instruction code can be used to distinguish between a direct and an indirect address. As an illustration of this configuration, consider the instruction code format shown in Fig(13). Computer Registers Computer instructions are normally stored in consecutive memory locations and are executed sequentially one at a time. The control reads an instruction from a specific address in memory and executes it. It then continues by reading the next instruction in sequence and executes it, and so on. This type of instruction sequencing needs a counter to calculate the address of the next instruction after execution of the current instruction is completed. The computer needs processor registers for manipulating data and a register for holding a memory address. These requirements dictate the register configuration are listed in Table(9) and shown in Fig(14). 20 The type of instruction is recognized by the computer control from the four bits in positions 12through 15 of the instruction. If the three opcode bits in positions 12 through 14 are not equal to 111, the instruction is a memory-reference type and the bit in position 15 is taken as the addressing mode J. If the 3-bit opcode is equal to 111, control then inspects the bit in position 15. If this bit is 0, the instruction is a register-reference type. If the bit is 1, the instruction is an input-output type. Note that the bit in position 15 of the instruction code is designated by the symbol J but is not used as a mode bit when the operation code is equal to 111. The instructions for the computer are listed in Table(10): The set of instructions are said to be complete if the computer includes a sufficient number of instructions in each of the following categories: 1. Arithmetic, logical, and shift instructions. 2. Instructions for moving information to and from memory and processor registers. 3. Instructions that check status information to provide decision making capabilities. 4. Input and output instructions. 5. The capability of stopping the computer. 21 Timing and Control The timing for all registers in the basic computer is controlled by a master clock generator. The clock pulses are applied to all flip-flops and registers in the system, including the flip- flops and registers in the control unit. The control signals are generated in the control unit and provide control inputs for the multiplexers in the common bus, control inputs in processor registers, and microoperations for the accumulator. There are two major types of control organization: 1- hardwired control : the control logic is implemented with gates, flip-flops, decoders, and other digital circuits. It has the advantage that it can be optimized to produce a fast mode of operation. 2- microprogrammed control: the control information is stored in a control memory. The control memory is programmed to initiate the required sequence of microoperations. The block diagram of the control unit is shown in Fig(17): SC is incremented with every positive clock transition, unless its CLR input is active. This produces the sequence of timing signals T0, T1, T2, T3, T4, and so on, as shown in the diagram. (Note the relationship between the timing signal and its corresponding positive 22 clock transition.) If SC is not cleared, the timing signals will continue with T5, T6, up to T15 and back to T0. As an example, consider the case where SC is incremented to provide timing signals T0, T1 T2, T3, and T4 in sequence. At time T4, SC is cleared to 0 if decoder output D3 is active. This is expressed symbolically by the statement 𝐷3𝑇4: 𝑆𝐶 ← 0 Instruction Cycle A program residing in the memory unit of the computer consists of a sequence of instructions. The program is executed in the computer by going through a cycle for each instruction. In the basic computer each instruction cycle consists of the following phases: 1. Fetch an instruction from memory. 2. Decode the instruction. 3. Read the effective address from memory if the instruction has an indirect address. 4. Execute the instruction. This process continues indefinitely unless a HALT instruction is encountered. Initially, the program counter PC is loaded with the address of the first instruction in the program. The sequence counter SC is cleared to 0, providing a decoded timing signal T0. After each clock pulse, SC is incremented by one, so that the timing signals go through a sequence T0, T1, T2, and so on. The microoperations for the fetch and decode phases can be specified by the following register transfer statements. 𝑇0: 𝐴𝑅 ← 𝑃𝐶 𝑇1: 𝐼𝑅 ← 𝑀[𝐴𝑅], 𝑃𝐶 = 𝑃𝐶 + 1 𝑇2: 𝐷0, , , , 𝐷7 ← 𝐷𝑒𝑐𝑜𝑑𝑒 𝐼𝑅(12 − 14), 𝐴𝑅 ← 𝐼𝑅(0 − 11), 𝐼 ← 𝐼𝑅(15) 25 The way that the interrupt is handled by the computer can be explained by means of the flowchart of Fig(19). An interrupt flip-flop R is included in the computer. Design of Basic Computer The basic computer consists of the following hardware components: 1. A memory unit with 4096 words of 16 bits each 2. Nine registers: AR, PC, DR, AC, IR, TR, OUTR, INPR, and SC 3. Seven flip-flops: I, S, E, R, IEN, FGI, and FGO 4. Two decoders: a 3 x 8 operation decoder and a 4 x 16 timing decoder 5. A 16-bit common bus 6. Control logic gates 7. Adder and logic circuit connected to the input of AC The outputs of the control logic circuit are: 1. Signals to control the inputs of the nine registers 2. Signals to control the read and write inputs of memory 3. Signals to set, clear, or complement the flip-flops 4. Signals for S2, S1, and S0 to select a register for the bus 5. Signals to control the AC adder and logic circuit. 26 Design of Accumulator Logic The circuits associated with the AC register are shown in Fig(20). The adder and logic circuit has three sets of inputs. In order to design the logic associated with AC, it is necessary to go over the register transfer statements and extract all the statements that change the content of AC. Control Memory The function of the control unit in a digital computer is to initiate sequences of microoperations. The number of different types of microoperations that are available in a given system is finite.A control unit whose binary control variables are stored in memory is called a microprogrammed control unit. Each word in control memory contains within it a microinstruction. The microinstruction specifies one or more microoperations for the system. A sequence of microinstructions constitutes a microprogram. A computer that employs a microprogrammed control unit will have two separate memories: a main memory and a control memory. The main memory is available to the user for storing the programs. The contents of main memory may alter when the data are manipulated and every time that the program is changed. The user's program in main memory consists of machine 27 instructions and data. While the control memory holds a fixed microprogram that cannot be altered by the occasional user. The general configuration of a microprogrammed control unit is demonstrated in the block diagram of Fig(21). Address Sequencing Microinstructions are stored in control memory in groups, with each group routine specifying a routine. An initial address is loaded into the control address register when power is turned on in the computer. This address is usually the address of the first microinstruction that activates the instruction fetch routine. The fetch routine may be sequenced by incrementing the control address register through the rest of its microinstructions. In summary, the address sequencing capabilities required in a control memory are: 1. Incrementing of the control address register. 2. Unconditional branch or conditional branch, depending on status bit conditions. 3. A mapping process from the bits of the instruction to an address for control memory. 4. A facility for subroutine call and return. Instruction format: The computer instruction format is depicted in Fig(22-a). It consists of three fields: a 1- bit held for indirect addressing symbolized by J, a 4-bit operation code (opcode), and an 11- bit address field. Fig(22-b) lists four of the 16 possible memory-reference instructions. 30 General Register Organization The memory locations are needed for storing pointers, counters, return addresses, temporary results, and partial products during multiplication. A bus organization for seven CPU registers is shown in Fig(26): The control unit that operates the CPU bus system directs the information flow through the registers and ALU by selecting the various components in the system. For example, to perform the operation: 31 𝑅1 ← 𝑅2 + 𝑅3 The control must provide binary selection variables to the following selector inputs: 1. MUX A selector (SELA): to place the content of R2 into bus A. 2. MUX B selector (SELB): to place the content of R3 into bus B. 3. ALU operation selector (OPR): to provide the arithmetic addition A + B. 4. Decoder destination selector (SELD): to transfer the content of the output bus into Rl. To achieve a fast response time, the ALU is constructed with high-speed circuits. There are 14 binary selection inputs in the unit, and their combined value control word specifies a control word. The three bits of SELA select a source register for the A input of the ALU. The three bits of SELB select a register for the B input of the ALU. The three bits of SELD select a destination register using the decoder and its seven load outputs. The five bits of OPR select one of the operations in the ALU. The encoding of the register selections is specified in Table(14): Table(15) OPR field has five bits and each operation is designated with a symbolic name. 32 For example, the subtract microoperation given by the statement: 𝑅1 ← 𝑅2 − 𝑅3 The binary control word for the subtract microoperation is 010 01l 001 00101 and is obtained as follows: Stack Organization A useful feature that is included in the CPU of most computers is a stack or last-in, first- out (LIFO) list. The two operations of a stack are the insertion and deletion of items. The operation of insertion is called push, while the operation of deletion is called pop. In a 64-word stack, the stack pointer contains 6 bits because 26 = 64. The push operation is implemented with the following sequence of microoperations: The pop operation consists of the following sequence of microoperations: Instruction Formats The format of an instruction is usually depicted in a rectangular box symbolizing the bits of the instruction as they appear in memory words or in a control register. The bits of the instruction are divided into groups called fields. The most common fields found in instruction formats are: 1. An operation code field that specifies the operation to be performed. 2. An address field that designates a memory address or a processor register. 3. A mode field that specifies the way the operand or the effective address is determined. An example of an accumulator-type organization, the instruction that specifies an arithmetic addition is defined by an assembly language instruction as: 35 Answer: 1- In the direct address mode the effective address is the address part of the instruction 500 and the operand to be loaded into AC is 800. 2- In the immediate mode the second word of the instruction is taken as the operand rather than an address, so 500 is loaded into AC. (The effective address in this case is 201) 3- In the indirect mode the effective address is stored in memory at address 500. Therefore, the effective address is 800 and the operand is 300. 4- In the relative mode the effective address is 500 + 202 = 702 and the operand is 325. (Note that the value in PC after the fetch phase and during the execute phase is 202) 5- In the index mode the effective address is XR + 500 = 100 + 500 = 600 and the operand is 900. 6- In the register mode the operand is in Rl and 400 is loaded into AC. (There is no effective address in this case) 7- In the register indirect mode the effective address is 400, equal to the content of Rl and the operand loaded into AC is 700. 8- The auto-increment mode is the same as the register indirect mode except that Rl is incremented to 401 after the execution of the instruction. 9- The auto-decrement mode decrements Rl to 399 prior to the execution of the instruction. The operand loaded into AC is now 450. Table (16) lists the values of the effective address and the operand loaded into AC for the nine addressing modes. 36 Data Transfer and Manipulation Most computer instructions can be classified into three categories: 1. Data transfer instructions. 2. Data manipulation instructions. 3. Program control instructions. Data transfer instructions cause transfer of data from one location to another without changing the binary information content. The table(17) list the Data transfer instructions: Data manipulation instructions are those that perform arithmetic, logic, and shift operations. The data manipulation instructions in a typical computer are usually divided into three basic types: 1- Arithmetic instructions. 2. Logical and bit manipulation instructions. 3. Shift instructions. Reduced Instruction Set Computer (RISC) An important aspect of computer architecture is the design of the instruction set for the processor. The instruction set chosen for a particular computer determines the way that machine language programs are constructed. A computer with a large number of instructions is classified as a complex instruction set computer, abbreviated CISC. In the early 1980s, a 37 number of computer designers recommended that computers use fewer instructions with simple constructs so they can be executed much faster within the CPU without having to use memory as often. This RISC type of computer is classified as a reduced instruction set computer or RISC. In summary, the major characteristics of CISC architecture are: 1. A large number of instructions—typically from 100 to 250 instructions. 2. Some instructions that perform specialized tasks and are used infrequently. 3. A large variety of addressing modes—typically from 5 to 20 different modes. 4. Variable-length instruction formats. 5. Instructions that manipulate operands in memory. The major characteristics of a RISC processor are: 1. Relatively few instructions. 2. Relatively few addressing modes. 3. Memory access limited to load and store instructions. 4. All operations done within the registers of the CPU. 5. Fixed-length, easily decoded instruction format. 6. Single-cycle instruction execution. 7. Hardwired rather than microprogrammed control. Memory Hierarchy The memory unit is an essential component in any digital computer since it is needed for storing programs and data. The memory unit that communicates directly with the CPU is called the main memory. Devices that provide backup storage are called auxiliary memory. They are used for storing system programs, large data files, and other backup information. Only programs and data currently needed by the processor reside in main memory. All other information is stored in auxiliary memory and transferred to main memory when needed. A special very-high-speed memory called a cache is sometimes used to increase the speed of processing by making current programs and data available to the CPU at a rapid rate. Fig(29) shows the Memory Hierarchy: 40 access time. In the second case, the table takes space from main memory and two accesses to memory are required with the program running at half speed. The table implementation of the address mapping is simplified if the information in the address space and the memory space are each divided into groups of fixed size. The physical memory is broken down into groups of equal size pages and blocks called blocks, which may range from 64 to 4096 words each. The term page refers to groups of address space of the same size. For example, if a page or block consists of IK words, then, using the previous example, address space is divided into 1024 pages and main memory is divided into 32 blocks. The organization of the memory mapping table in a paged system is shown in Fig(32). The memory-page table consists of eight words, one for each page. The address in the page table denotes the page number and the content of the word gives the block number where that page is stored in main memory. The table shows that pages 1, 2, 5, and 6 are now available in main memory in blocks 0, 1, 2, and 3, respectively. A presence bit in each location indicates whether the page has been transferred from auxiliary memory into main memory. A0 in the presence bit indicates that this page is not available in main memory. 41 Memory Management Hardware A memory management system is a collection of hardware and software procedures for managing the various programs residing in memory. The memory management software is part of an overall operating system available in many computers. The basic components of a memory management unit are: 1. A facility for dynamic storage relocation that maps logical memory references into physical memory addresses. 2. A provision for sharing common programs stored in memory by different users. 3. Protection of information against unauthorized access between users and preventing users from changing operating system functions. The fixed page size used in the virtual memory system causes certain difficulties with respect to program size and the logical structure of programs. It is more convenient to divide programs and segment data into logical parts called segments. A segment is a set of logically related instructions or data elements associated with a given name. Segments may be generated by the programmer or by the operating system. Examples of segments are a subroutine, an array of data, a table of symbols, or a user's program. The address generated by a segmented program is called a logical address. The logical address may be larger than the physical memory address as in virtual memory, but it may also be equal, and sometimes even smaller than the length of the physical memory address. Numerical Example: A numerical example may clarify the operation of the memory management unit. Consider the 20-bit logical address specified in Fig(33-a).This configuration allows each segment to have any number of pages up to 256. The smallest possible segment will have one page or 256 words. The largest possible segment will have 256 pages, for a total of 256 x 256 = 64K words. The physical memory shown in Fig(33-b). Consider a program loaded into memory that requires five pages. The operating system may assign to this program segment 6 and pages 0 through 4, as shown in Fig(34-a). The total 42 logical address range for the program is from hexadecimal 60000 to 604FF. The correspondence between each memory block and logical page number is then entered in a table as shown in Fig(34-b). The information from this table is entered in the segment and page tables as shown in Fig(35- a). Now consider the specific logical address given in Fig(35). The 20-bit address is listed as a five-digit hexadecimal number. It refers to word number 7E of page 2 in segment 6. The base of segment 6 in the page table is at address 35. Segment 6 has associated with it five pages, as shown in the page table at addresses 35 through 39. Page 2 of segment 6 is at address 35 + 2 = 37. The physical memory block is found in the page table to be 019. Word 7E in block 19 gives the 20-bit physical address 0197E. Note that page 0 of segment 6 maps into block 12 and page 1 maps into block 0. The associative memory in Fig(35-b) shows that pages 2 and 4 of segment 6 have been referenced previously and therefore their corresponding block numbers are stored in the associative memory. Continue 45 Isolated I/O versus Memory-Mapped I/O Many computers use one common bus to transfer information between memory or I/O and the CPU. In the isolated I/O configuration, the CPU has distinct input and output instructions, and each of these instructions is associated with the address of an interface register. The isolated I/O method isolates memory and I/O addresses so that memory address values are not affected by interface address assignment since each has its own address space. The other alternative is to use the same address space for both memory and I/O. This is the case in computers that employ only one set of read and write signals and do not distinguish between memory and I/O addresses. This configuration is referred to as memory- 46 mapped I/O. In a memory-mapped I/O organization there is no specific input or output instructions. Computers with memory-mapped I/O can use memory-type instructions to access I/O data. An example of an I/O interface unit is shown in block diagram form in Fig.37. It consists of two data registers called ports, a control register, a status register, bus buffers, and timing and control circuits. The interface communicates with the CPU through the data bus. The chip select and register select inputs determine the address assigned to the interface. The I/O read and write are two control lines that specify an input or output, respectively. The four registers communicate directly with the I/O device attached to the interface. Asynchronous Data Transfer The internal operations in a digital system are synchronized by means of clock pulses supplied by a common pulse generator. If the registers in the interface share a common clock with the CPU registers, the transfer between the two units is said to be synchronous. In most cases, the internal timing in each unit is independent from the other in that each uses its own private clock for internal registers. In that case, the two units are said to be asynchronous to each other. This approach is widely used in most computer systems. Asynchronous data transfer between two independent units requires that control signals be transmitted between the communicating units to indicate the time at which data is being transmitted. Two way of achieving this:  The strobe: pulse supplied by one of the units to indicate to the other unit when the transfer has to occur.  The handshaking: The unit receiving the data item responds with another control signal to acknowledge receipt of the data. The strobe pulse method and the handshaking method of asynchronous data transfer are not restricted to I/O transfers. The strobe may be activated by either the source or the destination unit. Figure 38 shows a source-initiated transfer and the timing diagram. 47 Fig.39 shows the strobe of a memory-read control signal from the CPU to a memory. The disadvantage of the strobe method is that the source unit that initiates the transfer has no way of knowing whether the destination unit has actually received the data item that was placed in the bus. The handshake method solves this problem by introducing a second control signal that provides a reply to the unit that two-wire control initiates the transfer. Figure 40 shows the data transfer procedure when initiated by the source. The two handshaking lines are data valid, which is generated by the source unit, and data accepted, generated by the destination unit. The timing diagram shows the exchange of signals between the two units. Figure 41 the destination-initiated transfer using handshaking lines. Note that the name of the signal generated by the destination unit has been changed to ready for data to reflect its new meaning. Asynchronous Serial Transfer The transfer of data between two units may be done in parallel or serial. In parallel data transmission, each bit of the message has its own path and the total message is transmitted at the same time. This means that an w-bit message must be transmitted through n separate conductor paths. In serial data transmission, each bit in the message is sent in sequence one at a time. This method requires the use of one pair of conductors or one conductor and a common ground. Parallel transmission is faster but requires many wires. It is used for short distances and where speed is important. Serial transmission is slower but is less expensive since it requires only one pair of conductors. Serial transmission can be synchronous or asynchronous. A transmitted character can be detected by the receiver from knowledge of the transmission rules: 1. When a character is not being sent, the line is kept in the 1-state. 2. The initiation of a character transmission is detected from the start bit, which is always(0). 3. The character bits always follow the start bit. 4. After the last bit of the character is transmitted, a stop bit is detected when the line returns to the 1-state for at least one bit time. 50 Pipelining Pipelining is a technique of decomposing a sequential process into sub-operations; with each sub-process being executed in a special dedicated segment that operates concurrently with all other segments. A pipeline can be visualized as a collection of processing segments through which binary information flows. General Considerations Any operation that can be decomposed into a sequence of sub-operations of about the same complexity can be implemented by a pipeline processor. The general structure of a four- segment pipeline is illustrated in Fig. 46. The operands pass through all four segments in a fixed sequence. The space-time diagram of a four-segment pipeline is demonstrated in Fig47. The speedup(S) of a pipeline processing over an equivalent non-pipeline processing is defined by the ratio: 𝑆 = 𝑛𝑡𝑛 (𝑘+𝑛−1)𝑡𝑝 51 As the number of tasks increases, n becomes much larger than 𝑘 − 1 , and 𝑘 + 𝑛 − 1 approaches the value of n. Under this condition, the speedup becomes: 𝑆 = 𝑡𝑛 𝑡𝑝 numerical example: Let the time it takes to process a sub-operation in each segment be equal to 𝑡𝑝= 20 ns. Assume that the pipeline has 𝑘 = 4 segments and executes 𝑛 = 100 tasks in sequence. The pipeline system will take (𝑘 + 𝑛 − 1)𝑡𝑝 = (4 + 99) × 20 = 2060𝑛𝑠 to complete. Assuming that t = ktp = 4 x 20 = 80 ns, a non-pipeline system requires: 𝑛𝑘𝑡𝑝 = 100 × 80 = 8000𝑛𝑠 to complete the 100 tasks. The speedup ratio is equal to: 8000 2060⁄ = 3.88 Instruction Pipeline The computer needs to process each instruction with the following sequence of steps: 1. Fetch the instruction from memory. 2. Decode the instruction. 3. Calculate the effective address. 4. Fetch the operands from memory. 5. Execute the instruction. 6. Store the result in the proper place. Figure 48 shows how the instruction cycle in the CPU can be processed with a four-segment pipeline. While an instruction is being executed in segment 4, the next instruction in sequence is busy fetching an operand from memory in segment 3. The four segments are represented in the flowchart: 1. FI is the segment that fetches an instruction. 2. DA is the segment that decodes the instruction and calculates the effective address. 3. FO is the segment that fetches the operand. 4. EX is the segment that executes the instruction. 52 A pipeline operation is said to have been stalled if one unit (stage) requires more time to perform its function, thus forcing other stages to become idle. Consider, for example, the case of an instruction fetch that incurs a cache miss. Assume also that a cache miss requires three extra time units. Instruction-Level Parallelism Contrary to pipeline techniques, instruction-level parallelism (ILP) is based on the idea of multiple issue processors (MIP). An MIP has multiple pipelined datapaths for instruction execution. Each of these pipelines can issue and execute one instruction per cycle. Figure 49 shows the case of a processor having three pipes. For comparison purposes, we also show in the same figure the sequential and the single pipeline case. 55 Suppose that the time delays of the four segments are 𝑡1 = 60𝑛𝑠, 𝑡2 = 70𝑛𝑠, 𝑡3 = 100𝑛𝑠, 𝑡4 = 80𝑛𝑠, and the interface registers have a delay of 𝑡𝑟 = 10𝑛𝑠. The clock cycle is chosen to be 𝑡𝑝 = 𝑡3 + 𝑡𝑟 = 110𝑛𝑠 . An equivalent non-pipeline floating point adder-subtractor will have a delay time 𝑡𝑛 = 𝑡1 + 𝑡2 + 𝑡3 + 𝑡4 + 𝑡𝑟 = 320𝑛𝑠. In this case the pipelined adder has a speedup of 320/110 = 2.9 over the non-pipelined adder. Supercomputers Supercomputers are very powerful, high-performance machines used mostly for scientific computations. To speed up the operation, the components are packed tightly together to minimize the distance that the electronic signals have to travel. Supercomputers also use special techniques for removing the heat from circuits to prevent them from burning up because of their close proximity. A supercomputer is a computer system best known for its high computational speed, fast and large memory systems, and the extensive use of parallel processing. Delayed Branch Consider now the operation of the following four instructions: If the three-segment pipeline proceeds: (I: Instruction fetch, A:ALU operation, and E: Execute instruction) without interruptions, there will be a data conflict in instruction 3 because the operand in R2 is not yet available in the A segment. This can be seen from the timing of the pipeline shown in Fig. 50(a). The E segment in clock cycle 4 is in a process of placing the memory data into R2. The A segment in clock cycle 4 is using the data from R2, but the value in R2 will not be the correct value since it has not yet been transferred from memory. It is up to the compiler to make sure that the instruction following the load instruction uses the data fetched from memory. It was shown in Fig. 50 that a branch instruction delays the pipeline operation by NOP instruction until the instruction at the branch address is fetched. 56 Computer Arithmetic Arithmetic instructions in digital computers manipulate data to produce results necessary for the solution of computational problems. An arithmetic processor is the part of a processor unit that executes arithmetic operations. The data type assumed to reside in processor registers during the execution of an arithmetic instruction is specified in the definition of the instruction. The solution to any problem that is stated by a finite number of well-defined procedural steps is called an algorithm. Addition and Subtraction with Signed-Magnitude Data: We designate the magnitude of the two numbers by A and B. When the signed numbers are added or subtracted, we find that there are eight different conditions to consider, depending on the sign of the numbers and the operation performed. These conditions are listed in the first column of Table 18. The other columns in the table show the actual operation to be performed with the magnitude of the numbers. The last column is needed to prevent a negative zero. In other words, when two equal numbers are subtracted, the result should be +0 not -0. 57 Hardware Implementation: Let A and B be two registers that hold the magnitudes of the numbers, and As and Bs be two flip-flops that hold the corresponding signs. Consider now the hardware implementation of the algorithms above: 1- First, a parallel-adder is needed to perform the microoperation A + B. 2- Second, a comparator circuit is needed to establish if A > B, A = B, or A < B. 3- Third, two parallel-subtractor circuits are needed to perform the microoperations (A-B) and (B-A). 4- The sign relationship can be determined from an exclusive- OR gate with As and Bs as inputs. Careful investigation of the alternatives reveals that the use of 2's complement for subtraction and comparison is an efficient procedure that requires only an adder and a complementer. Figure 51 shows a block diagram of the hardware for implementing the addition and subtraction operations. It consists of registers A and B and sign flip-flops As and Bs. Subtraction is done by adding A to the 2's complement of B. The output carry is transferred to flip-flop E, where it can be checked to determine the relative magnitudes of the two numbers. The add-overflow flip-flop AVF holds the overflow bit when A and B are added. The adder is equal to the sum A + B. When M = 1, the l's complement of B is applied to the adder, the input carry is 1, and output S = A + B +1. This is equal to A plus the 2's complement of B, which is equivalent to the subtraction A - B. The signed 2's complement representation of numbers together with arithmetic algorithms for addition and subtraction are introduced as: The leftmost bit of a binary number represents the sign bit: 0 for positive and 1 for negative. If the sign bit is 1, the entire number is represented in 2's complement form. Thus +33 is represented as 00100001 and -33 as 11011111. Note that 11011111 is the 2's complement of 00100001, and vice versa. The addition of two numbers in signed 2's complement form consists of adding the numbers with the sign bits treated the same as the other bits of the number. A carry-out of the sign-bit position is discarded. The subtraction consists of first taking the 2's complement of the subtrahend and then adding it to the minuend. 60 The hardware for implementing the division operation is identical to that required for multiplication and consists of the components Register EAQ is now shifted to the left with 0 inserted into Q, and the previous value of E lost. The numerical example is repeated as in Figure 53: Decimal Arithmetic Unit To perform arithmetic operations with decimal data, it is necessary to convert the input decimal numbers to binary, to perform all calculations with binary numbers, and to convert the results into decimal. It can add or subtract decimal numbers, usually by forming the 9's or 10's complement of the subtrahend. Consider the arithmetic addition of two decimal digits in BCD, together with a possible carry from a previous stage. To add 0110 to the binary sum, we use a second 4-bit binary adder as shown in Fig. 54. The two decimal digits, together with the input- 61 carry, are first added in the top 4-bit binary adder to produce the binary sum. When the output- carry is equal to 0, nothing is added to the binary sum. A straight subtraction of two decimal numbers will require a subtractor circuit that will be somewhat different from a BCD adder. The 9's complement of a decimal digit represented in BCD may be obtained by complementing the bits in the coded representation of the digit provided a correction is included. There are two possible correction methods. In the first method, binary 1010 (decimal 10) is added to each complemented digit and the carry discarded after each addition. In the second method, binary 0110 (decimal 6) is added before the digit is complemented. 62 One stage of a decimal arithmetic unit that can add or subtract two BCD digits is shown in Fig. 55. It consists of a BCD adder and a 9's complementer. The mode M controls the operation of the unit. With M = 0, the S outputs form the sum of A and B. With M = 1, the S outputs form the sum of A plus the 9’s complement of B. For numbers with n decimal digits we need n such stages. The output carry Ci+1 from one stage must be connected to the input carry Ci of the next-higher-order stage. The best way to subtract the two decimal numbers is to let M = 1 and apply a 1 to the input carry Ci of the first stage. The outputs will form the sum of A plus the 10's complement of B, which is equivalent to a subtraction operation if the carry-out of the last stage is discarded. As a numerical illustration, the 9's complement of BCD 0111 (decimal 7) is computed by first complementing each bit to obtain 1000. Adding binary 1010 and discarding the carry, we obtain 0010 (decimal 2). By the second method, we add 0110 to 0111 to obtain 1101. Complementing each bit, we obtain the required result of 0010. One stage of a decimal arithmetic unit that can add or subtract two BCD digits is shown in Figure 55. It consists of a BCD adder and a 9's complementer. Reduced Instruction Set Computers (RISCs) The RISC approach is RISC-based machines are reality and they are characterized by a number of common features such as simple and reduced instruction set, fixed instruction format, one instruction per machine cycle, pipeline instruction fetch/execute units, ample number of general purpose registers (or alternatively optimized compiler code generation), Load/Store memory operations, and hardwired control unit design. While Complex Instruction Set Computers (CISCs) is became apparent that a complex instruction set has a number of disadvantages. These include a complex instruction decoding scheme, an increased size of the control unit, and increased logic delays. RISCs DESIGN PRINCIPLES A computer with the minimum number of instructions has the disadvantage that a large number of instructions will have to be executed in realizing even a simple function. This will result in a speed disadvantage. The observations about typical program behavior have led to the following conclusions: 1. Simple movement of data (represented by assignment statements), rather than complex operations, are substantial and should be optimized. 2. Conditional branches are predominant and therefore careful attention should be paid to the sequencing of instructions. This is particularly true when it is known that pipelining is indispensable to use. 3. Procedure calls/return are the most time-consuming operations and therefore a mechanism should be devised to make the communication of parameters among the calling and the called procedures cause the least number of instructions to execute.

Documents

questions

Introduction to computer architecture, Cheat Sheet of Advanced Computer Architecture

Related documents

Partial preview of the text