Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Computer Architecture Design handout chapter 1 to 4, Assignments of Computer Science

Computer Architecture Design handout chapter 1 to 4

Typology: Assignments

2019/2020

Uploaded on 01/06/2022

salman-siddique-2
salman-siddique-2 🇵🇰

2 documents

1 / 442

Toggle sidebar

Related documents


Partial preview of the text

Download Computer Architecture Design handout chapter 1 to 4 and more Assignments Computer Science in PDF only on Docsity! Advanced Computer Architecture-CS501 Lecture Handouts CS501 Advance Computer Architecture Advanced Computer Architecture Table of Content Lecture No. 1 - Introduction Lecture No. 2 - Introduction Set Architecture Lecture No. 3 ------------------- Instruction to SRC Processor Lecture No. 4 ---------------- ISA and Instruction Format Lecture No. 5 Description of SRC in RTL - Lecture No. 6 --- RTL Using Digital Logic Circuits Lecture No. 7 --- Design Process forIS A of FALCON-A-: Lecture No. 8 -------------- ISA of the FALCON-A - Lecture No. 9 Description of FALCON-A and EAGLE Lecture No. 10. The FALCON-E and ISA Comparis Lecture No. 11-- CISC and RISC Lecture No. 12-- CPU Design: Lecture No. 13-- Structural RTLDescription of the FALCON-A -: Lecture No. 14- External FALCON-A CPU Lecture No. 15 Logic Design and Control Signals Lecture No. 16-- Control Unit Des Lecture No. 17 Machine Reset and Machine Exceptions Lecture No. 18 Pipelining Lecture 19-- Pipelined SRC Lecture No. 20. Hazards in Pipelining - Lecture 21 Instruction Level Parallelism--------------------------------------------- 20-2 -02--ecennne anne Generation in SRC- Advanced Computer Architecture Lecture No. 1 Introduction Reading Material Vincent P. Heuring & Harry F. Jordan Chapter 1 Computer Systems Design and Architecture 1.1, 1.2, 1.3, 1.4, 1.5 Summary 1) Distinction between computer architecture, organization and design 2) Levels of abstraction in digital design 3) Introduction to the course topics 4) Perspectives of different people about computers 5) General operation of a stored program digital computer 6) The Fetch-Execute process 7) Concept of an ISA(Instruction Set Architecture) Introduction This course is about Computer Architecture. We start by explaining a few key terms. The General Purpose Digital Computer How can we define a ‘computer’? There are several kinds of devices that can be termed “computers”: from desktop machines to the microcontrollers used in appliances such as a microwave oven, from the Abacus to the cluster of tiny chips used in parallel processors, etc. For the purpose of this course, we will use the following definition of a computer: Advanced Computer Architecture Electrical Systems General Purpose Digital Computers Systems Notion of a System “an electronic device, operating under the control of instructions stored in its own memory unit, that can accept data (input), process data arithmetically and logically, produce output from the processing, and store the results for future use.” [1] Thus, when we use the term computer, we actually mean a digital computer. There are many digital computers, which have dedicated purposes, for example, a computer used in an automobile that controls the spark timing for the engine. This means that when we use the term computer, we actually mean a general-purpose digital computer that can perform a variety of arithmetic and logic tasks. The Computer as a System Now we examine the notion of a system, and the place of digital computers in the general universal set of systems. A “system” is a collection of elements, or components, working together on one or more inputs to produce one or more desired outputs. There are many types of systems in the world. Examples include: * Chemical systems * Optical systems * Mechanical systems, etc. Advanced Computer Architecture These are all subsets of the general universal set of “systems”. One particular subset of interest is an “electrical system”. In case of electrical systems, the inputs as well as the outputs are electrical quantities, namely voltage and current. “Digital systems” are a subset of electrical systems. The inputs and outputs are digital quantities in this case. General-purpose digital computers are a subset of digital systems. We will focus on general-purpose digital computers in this course. wags Memory |__| 1 jv cPu Subsytem I (uP) Vo Subsystem (Peripherals) —vJ SZ Block Diagram of a Computer System Essential Elements of a General Purpose Digital Computer The figure shows the block diagram of a modern general-purpose digital computer. We observe from the diagram that a general-purpose computer has three main components: a memory subsystem, an input/ output subsystem, and a central processing unit. Programs are stored in the memory, the execution of the program instructions takes place in the CPU, and the communication with the external world is achieved through the I/O subsystem (including the peripherals). Architecture Now that we understand the term “computer” in our context, let us focus on the term architecture. The word architecture, as defined in standard dictionaries, is “the art or science of building”, or “a method or style of building”. [2] Computer Architecture This term was first used in 1964 by Amdahl, Blaauw, and Brooks at IBM [3]. They defined it as “the structure of a computer that a machine language programmer must understand to write a correct (time independent) program for that machine.” Advanced Computer Architecture transfer level. They will be able to use basic combinational and sequential building blocks to design larger structures like ALUs (Arithmetic Logic Units), memory subsystems, I/O subsystems etc. It will help them understand the various approaches used to design computer CPUs (Central Processing Units) of the RISC (Reduced Instruction Set Computers) and the CISC (Complex Instruction Set Computers) type, as well as the principles of cache memories. Important topics to be covered * Review of computer organization * Classification of computers and their instructions * Machine characteristics and performance ¢ Design of a Simple RISC Computer: the SRC ¢ Advanced topics in processor design ¢ Input-output (I/O) subsystems. ¢ Arithmetic Logic Unit implementation * Memory subsystems Course Outline Introduction: ¢ Distinction between Computer Architecture, Organization and design ¢ Levels of abstraction in digital design ¢ Introduction to the course topics Brief review of computer organization: ¢ Perspectives of different people about computers © General operation of a stored program digital computer e The Fetch — Execute process © Concept of an ISA Foundations of Computer Architecture: ¢ A taxonomy of computers and their instructions ¢ Instruction set features « Addressing Modes © RISC and CISC architectures e Measures of performance An example processor: The SRC: ¢ Introduction to the ISA and instruction formats © Coding examples and Hand assembly ¢ Using Behavioral RTL to describe the SRC ¢ Implementing Register Transfers using Digital Logic Circuits ISA: Design and Development © Outline of the thinking process for ISA design ¢ Introduction to the ISA of the FALCON — A ¢ Solved examples for FALCON-A e Learning Aids for the FALCON-A Other example processors: 10 Advanced Computer Architecture e FALCON-E e EAGLE and Modified EAGLE ¢ Comparison of the four ISAs CPU Design: The Design Process A Uni-Bus implementation for the SRC Structural RTL for the SRC instructions Logic Design for the 1-Bus SRC The Control Unit The 2-and 3-Bus Processor Designs The Machine Reset Machine Exceptions Term Exam —I Advanced topics in processor design: ¢ Pipelining e Instruction-Level Parallelism ¢ Microprogramming Input-output (I/O): e I/O interface design e Programmed I/O ¢ Interrupt driven I/O e Direct memory access (DMA) Term Exam — II Arithmetic Logic Shift Unit (ALSU) implementation: e Addition, subtraction, multiplication & division for integer unit ¢ Floating point unit Memory subsystems: ¢ Memory organization and design ¢ Memory hierarchy ¢ Cache memories e Virtual memory References [1] Shelly G.B., Cashman T.J., Waggoner G.A., Waggoner W.C., Complete Computer Concepts: Microcomputer and Applications. Ferncroft Village Danvers, Massachusetts: Boyd & Fraser, 1992. [2] Merriam-Webster Online; The Language Centre, May 12, 2003 ( http://www.m- w.com/home.htm). [3] Patterson, D.A. and Hennessy, J.L., Computer Architecture- A Quantitative Approach, Qn ed., San Francisco, CA: Morgan Kauffman Publishers Inc., 1996. il Advanced Computer Architecture [4] Heuring V.P. and Jordan H.F., Computer Systems Design and Architecture. Melano Park, CA: Addison Wesley, 1997. A brief review of Computer Organization Perceptions of Different People about Computers There are various perspectives that a computer can take depending on the person viewing it. For example, the way a child perceives a computer is quite different from how a computer programmer or a designer views it. There are a number of perceptions of the computer, however, for the purpose of understanding the machine, generally the following four views are considered. The User’s View A user is the person for whom the machine is designed, and who employs it to perform some useful work through application software. This useful work may be composing some reports in word processing software, maintaining credit history in a spreadsheet, or even developing some application software using high-level languages such as C or Java. The list of “usefull work” is not all-inclusive. Children playing games on a computer may argue that playing games is also “useful work”, maybe more so than preparing an internal office memo. At the user’s level, one is only concerned with things like speed of the computer, the storage capacity available, and the behavior of the peripheral devices. Besides performance, the user is not involved in the implementation details of the computer, as the internal structure of the machine is made obscure by the operating system interface. The Programmer’s View By “programmer” we imply machine or assembly language programmer. The machine or the assembly language programmer is responsible for the implementation of software required to execute various commands or sequences of commands (programs) on the computer. Understanding some key terms first will help us better understand this view, the associated tasks, responsibilities and tools of the trade. Machine Language Machine language consists of all the primitive instructions that a computer understands and is able to execute. These are strings of 1s and 0s.Machine language is the computer’s native language. Commands in the machine language are expressed as strings of 1s and Os. It is the lowest level language of a computer, and requires no further interpretation. Instruction Set A collection of all possible machine language commands that a computer can understand and execute is called its instruction set. Every processor has its own unique instruction set. Therefore, programs written for one processor will generally not run on another processor. This is quite unlike programs written in higher-level languages, which may be portable. Assembly/machine languages are generally unique to the processors on which they are run, because of the differences in computer architecture. Three ways to list instructions in an instruction set of a computer: * by function categories * by an alphabetic ordering of mnemonics * by an ascending order of op-codes 12 Advanced Computer Architecture Difference between Higher-Level Languages and Assembly Language Higher-level languages are generally used to develop application software. These high- level programs are then converted to assembly language programs using compilers. So it is the task of a compiler writer to determine the mapping between the high-level- language constructs and assembly language constructs. Generally, there is a “many-to- many” mapping between high-level languages and assembly language constructs. This means that a given HLL construct can generally be represented by many different equivalent assembly language constructs. Alternately, a given assembly language construct can be represented by many different equivalent HLL constructs. High-level languages provide various primitive data types, such as integer, Boolean and a string, that a programmer can use. Type checking provides for the verification of proper usage of these data types. It allows the compiler to determine memory requirements for variables and helping in the detection of bad programming practices. On the other hand, there is generally no provision for type checking at the machine level, and hence, no provision for type checking in assembly language. The machine only sees strings of bits. Instructions interpret the strings as a type, and it is usually limited to signed or unsigned integers and floating point numbers. A given 32-bit word might be an instruction, an integer, a floating-point number, or 4 ASCII characters. It is the task of the compiler writer to determine how high-level language data types will be implemented using the data types available at the machine level, and how type checking will be implemented. The Stored Program Concept This concept is fundamental to all the general-purpose computers today. It states that the program is stored with data in computer’s memory, and the computer is able to manipulate it as data. For example, the computer can load the program from disk, move it around in memory, and store it back to the disk. Even though all computers have unique machine language instruction sets, the ‘stored program’ concept and the existence of a ‘program counter’ is common to all machines. The sequence of instructions to perform some useful task is called a program. All of the digital computers (the general purpose machine defined above) are able to store these sequences of instructions as stored programs. Relevant data is also stored on the computer’s secondary memory. These stored programs are treated as data and the computer is able to manipulate them, for example, these can be loaded into the memory for execution and then saved back onto the storage. General Operation of a Stored Program Computer The machine language programs are brought into the memory and then executed instruction by instruction. Unless a branch instruction is encountered, the program is executed in sequence. The instruction that is to be executed is fetched from the memory and temporarily stored in a CPU register, called the instruction register (IR). The instruction register holds the instruction while it is decoded and executed by the central processing unit (CPU) of the computer. However, before loading an instruction into the instruction register for execution, the computer needs to know which instruction to load. The program counter (PC), also called the instruction pointer in some texts, is the register 15 Advanced Computer Architecture that holds the address of the next instruction in memory that is to be executed. When the execution of an instruction is completed, the contents of the program counter (which is the address of the next instruction) are placed on the address bus. The memory places the instruction on the corresponding address on the data bus. The CPU puts this instruction onto the IR (instruction register) to decode and execute. While this instruction is decoded, its length in bytes is determined, and the PC (program counter) is incremented by the length, so that the PC will point to the next instruction in the memory. Note that the length of the instruction is not determined in the case of RISC machines, as the instruction length is fixed in these architectures, and so the program counter is always incremented by a fixed number. In case of branch instructions, the contents of the PC are replaced by the address of the next instruction contained in the present branch instruction, and the current status of the processor is stored in a register called the Processor Status Word (PSW). Another name for the PSW is the flag register. It contains the status bits, and control bits corresponding to the state of the processor. Examples of status bits include the sign bit, overflow bit, etc. Examples of control bits include interrupt enable flag, etc. When the execution of this instruction is completed, the contents of the program counter are placed on the address bus, and the entire cycle is repeated. This entire process of reading memory, incrementing the PC, and decoding the instruction is known as the Fetch and Execute principle of the stored program computer. This is actually an oversimplified situation. In case of the advanced processors of this age, a lot more is going on than just the simple “fetch and execute” operation, such as pipelining etc. The details of some of these more involved techniques will be studied later on during the course. The Concept of Instruction Set Architecture (ISA) Now that we have an understanding of some of the relevant key terms, we revert to the assembly language programmer’s perception of the computer. The programmer’s view is limited to the set of all the assembly instructions or commands that can the particular computer at hand execute understood/, in addition to the resources that these instructions may help manage. These resources include the memory space and the entire programmer accessible registers. Note that we use the term ‘memory space’ instead of memory, because not all the memory space has to be filled with memory chips for a particular implementation, but it is still a resource available to the programmer. This set of instructions or operations and the resources together form the instruction set architecture (ISA). It is the ISA, which serves as an interface between the program and the functional units of a computer, i.e., through which, the computer’s resources, are accessed and controlled. The Computer Architect’s View The computer architect’s view is concerned with the design of the entire system as well as ensuring its optimum performance. The optimality is measured against some quantifiable objectives that are set out before the design process begins. These objectives are set on the basis of the functionality required from the machine to be designed. The computer architect e Designs the ISA for optimum programming utility as well as for optimum performance of implementation 16 Advanced Computer Architecture e Designs the hardware for best implementation of instructions that are made available in the ISA to the programmer ¢ Uses performance measurement tools, such as benchmark programs, to verify that the performance objectives are met by the machine designed ¢ Balances performance of building blocks such as CPU, memory, I/O devices, and interconnections ¢ Strives to meet performance goals at the lowest possible cost Useful tools for the computer architect Some of the tools available that facilitate the design process are e Software models, simulators and emulators Performance benchmark programs Specialized measurement programs Data flow and bottleneck analysis Subsystem balance analysis Parts, manufacturing, and testing cost analysis The Logic Designer’s View The logic designer is responsible for the design of the machine at the logic gate level. It is the design process at this level that determines whether the computer architect meets cost and performance goals. The computer architect and the logic designer have to work in collaboration to meet the cost and performance objectives of a machine. This is the reason why a single person or a single team may be performing the tasks of system’s architectural design as well as the logic design. Useful Tools for the Logic Designer Some of the tools available that aid the logic designer in the logic design process are e CAD tools = Logic design and simulation packages = Printed circuit layout tools = IC (integrated circuit) design and layout tools ¢ Logic analyzers and oscilloscopes ¢ Hardware development systems The Concept of the Implementation Domain The collection of hardware devices, with which the logic designer works for the digital logic gate implementation and interconnection of the machine, is termed as the implementation domain. The logic gate implementation domain may be e VLSI (very large scale integration) on silicon ¢ TTL (transistor-transistor logic) or ECL (emitter-coupled logic) chips ¢ Gallium arsenide chips e PLAs (programmable-logic arrays) or sea-of-gates arrays ¢ Fluidic logic or optical switches Similarly, the implementation domains used for gate, board and module interconnections are 17 Advanced Computer Architecture case, of length 32 bits or 4 bytes. 31 0 PC (a) Program Counter: Programmer’s view Figure (b) illustrates the logic designer’s view of a 32-bit program counter, implemented as an array of 32 D flip-flops. It shows the contents of the program counter being gated out on ‘A bus’ (the address bus) by applying a control signal PCou:. The contents of the “B bus’ (also the address bus), can be stored in the program counter by asserting the signal PCj, on the leading edge of the clock signal CK, thus storing the address of the next instruction in the program counter. 32 32 A Bus Q 7 B Bus PC PCout CK PCy b) Program Counter: Logic Designer’s View ( er gi g 20 Advanced Computer Architecture Lecture No. 2 Introduction Set Architecture Reading Material Vincent P. Heuring & Harry F. Jordan Chapter 2,Chapter3 Computer Systems Design and Architecture 2.1, 2.2, 3.2 Summary 1) A taxonomy of computers and their instructions 2) Instruction set features 3) Addressing modes 4) RISC and CISC architectures Foundations of Computer Architecture TAXONOMY OF COMPUTERS AND THEIR INSTRUCTIONS Processors can be classified on the basis of their instruction set architectures. The instruction set architecture, described in the previous module gives us a ‘programmer’s view’ of the machine. This module discussed a number of topics related to the classifications of computers and their instructions. CLASSES OF INSTRUCTION SET ARCHITECTURE: The mechanism used by the CPU to store instructions and data can be used to classify the ISA (Instruction Set Architecture). There are three types of machines based on this classification. ¢ Accumulator based machines ¢ Stack based machines * General purpose register (GPR) machines ACCUMULATOR BASED MACHINES Accumulator based machines use special registers called the accumulators to hold one source operand and also the result of the arithmetic or logic operations performed. Thus the accumulator registers collect (or ‘accumulate’) data. Since the accumulator holds one of the operands, one more register may be required to hold the address of another operand. The accumulator is not used to hold an address. So accumulator based machines are also called 1-address machines. Accumulator machines employ a very small number of accumulator registers, generally only one. These machines were useful at the time 21 Advanced Computer Architecture when memory was quite expensive; as they used one register to hold the source operand as well as the result of the operation. However, now that the memory is relatively inexpensive, these are not considered very useful, and their use is severely limited for the computation of expressions with many operands. STACK BASED MACHINES A stack is a group of registers organized as a last-in-first-out (LIFO) structure. In such a structure, the operands stored first, through the push operation, can only be accessed last, through a pop operation; the order of access to the operands is reverse of the storage operation. An analogy of the stack is a “plate-dispenser” found in several self-service cafeterias. Arithmetic and logic operations successively pick operands from the top-of- the-stack (TOS), and push the results on the TOS at the end of the operation. In stack based machines, operand addre need not be specified during the arithmetic or logical operations. Therefore, these machines are also called 0-address machines. GENERAL-PURPOSE-REGISTER MACHINES In general purpose register machines, a number of registers are available within the CPU. These registers do not have dedicated functions, and can be employed for a variety of purposes. To identify the register within an instruction, a small number of bits are required in an instruction word. For example, to identify one of the 64 registers of the CPU, a 6-bit field is required in the instruction. CPU registers are faster than cache memory. Registers are also easily and more effectively used by the compiler compared to other forms of internal storage. Registers can also be used to hold variables, thereby reducing memory traffic. This increases the execution speed and reduces code size (fewer bits required to code register names compared to memory) .In addition to data, registers can also hold addresses and pointers (ie., the address of an address). This increases the flexibility available to the programmer. A number of dedicated, or special purpose registers are also available in general-purpose machines, but many of them are not available to the programmer. Examples of transparent registers include the stack pointer, the program counter, memory address register, memory data register and condition codes (or flags) register, etc. We should understand that in reality, most machines are a combination of these machine types. Accumulator machines have the advantage of being more efficient as these can store intermediate results of an operation within the CPU. INSTRUCTION SET An instruction set is a collection of all possible machine language commands that are understood and can be executed by a processor. ESSENTIAL ELEMENTS OF COMPUTER INSTRUCTIONS: There are four essential elements of an instruction; the type of operation to be performed, the place to find the source operand(s), the place to store the result(s) and the source of the next instruction to be executed by the processor. Type of operation In module 1, we described three ways to list the instruction set of a machine; one way of enlisting the instruction set is by grouping the instructions in accordance with the 22 Advanced Computer Architecture ¢ Number of memory accesses The number of memory accesses has an effect on the execution time of instructions; the greater the number of memory accesses, the larger the time required for the execution cycle, as memory accesses are generally slow. Assumptions We make a few assumptions, which are e¢ A single byte is used for the op code, so 256 instructions can be encoded using these 8 bits, as 2° = 256 ¢ The size of the memory address space is 16 Mbytes e A single addressable memory unit is a byte e Size of operands is 24 bits. As the memory size is 16Mbytes, with byte- addressable memory, 24 bits are required to encode the address of the operands. ¢ The size of the address bus is 24 bits e Data bus size is 8 bits Discussion4-address instruction * The code . . op code destination | source 1 source 2 next address size is 13 bytes (143434343 ivf 3 bytes 2 bytes 3 bytes 3 bytes = 13 bytes) ¢ Number of bytes accessed from memory is 22 (13 bytes for instruction fetch + 6 bytes for source operand fetch + 3 bytes for storing destination operand = 22 bytes) Note that there is no need for an additional memory access for the operand corresponding to the next instruction, as it has already been brought into the CPU during instruction fetch. 3-address instruction * The code size is 10 bytes | code destination source 1 source 2 (14+3+34+43 = 10 bytes) ¢ Number of bytes accessed from memory is 22 (10 bytes for instruction fetch + 6 bytes for source operand fetch + 3 bytes for storing destination operand = 19 41 byte 3 bytes 3 bytes 3 bytes bytes) 2-address instruction * The code size is 7 bytes (143+3 = 7 [opcode destination | source 2 bytes) source 1 * Number of bytes accessed from memory is 16(7 bytes for instruction | 1 byte 3 bytes 3 bytes fetch + 6 bytes for source operand fetch + 3 bytes for storing destination operand = 16 bytes) 25 Advanced Computer Architecture 1-address instruction op code source 2 * The code size is 4 bytes (1+3= 4 bytes) ¢ Number of bytes accessed from memory is 7 (4 bytes for instruction fetch + 3 bytes for source |" byte dhytes operand fetch + 0 bytes for storing destination operand =7 bytes) 0-address instruction op code * The code size is 1 byte ¢ Number of bytes accessed from memory is 10 (1 byte for instruction fetch + 6 bytes for source operand fetch + 3 V byte bytes for storing destination operand = 10 bytes) The following table summarizes this information HALF ADDRESSES Instruction Format | Code Number of In the preceding discussion we have size memory bytes talked about memory addresses. This Sadress instruction a 22 discussion also applies to CPU [assists | i registers. However, to specify/ encode —[{raddress instruction 4 7 a CPU register, less number of bits is LO-address instruction 1 10 required as compared to the memory addresses. Therefore, these addresses are also called “half-addresses”. An instruction that specifies one memory address and one CPU register can be called as a 1¥2-address instruction Example mov al, [34h] THE PRACTICAL SITUATION Real machines are not as simple as the classifications presented above. In fact, these machines have a mixture of 3, 2, 1, 0, and 1%-address instructions. For example, the VAX 11 includes instructions from all classes. CLASSIFICATION OF MACHINES ON THE BASIS OF OPERAND AND RESULT LOCATION: A distinction between machines can be made on the basis of the ALU instructions; whether these instructions use data from the memory or not. If the ALU instructions use only the CPU registers for the operands and result, the machine type is called “load- store”. Other machines may have a mixture of register-memory, or memory-memory instructions. The number of memory operands supported by a typical ALU instruction may vary from Oto 3. Example The SPARC, MIPS, Power PC, ALPHA: 0 memory addresses, max operands allowed = 3 26 Advanced Computer Architecture X86, 68x series: | memory address, max operands allowed = 2 LOAD- STORE MACHINES These machines are also called the register-to-register machines. They typically use the 1% address instruction format. Only the load and store instructions can access the memory. The load instruction fetches the required data from the memory and temporarily stores it in the CPU registers. Other instructions may use this data from the CPU registers. Then later, the results can be stored back into the memory by the store instruction. Most RISC computers fall under this category of machines. Advantages (of register-register instructions) Register-register instructions use 0 memory operands out of a total of 3 operands. The advantages of such a scheme i: ¢ The instructions are simple and fixed in length ¢ The corresponding code generation model is simple e All instructions take similar number of clock cycles for execution Disadvantages (register-register instructions) ¢ The instruction count is higher; the number of instructions required to complete a particular task is more as separate instructions will be required for load and store operations of the memory e Since the instruction size is fixed, the instructions that do not require all fields waste memory bits Register-memory machines In register-memory machines, some operands are in the memory and some are in registers. These machines typically employ | or 1% address instruction format, in which one of the operands is an accumulator or a general-purpose CPU registers. Advantages Register-memory operations use one memory operand out of a total of two operands. The advantages of this instruction format are ¢ Operands in the memory can be accessed without having to load these first through a separate load instruction « Encoding is easy due to the elimination of the need of loading operands into registers first ¢ Instruction bit usage is relatively better, as more instructions are provided per fixed number of bits Disadvantages ¢ Operands are not equivalent since one operand may have two functions (both source operand and destination operand), and the source operand may be destroyed ¢ Different size encoding for memory and registers may restrict the number of registers 27 Advanced Computer Architecture Example: Ida [123] Memory As shown in the figure, the address of the operand is stored in the instruction. ® as The operand is then fetched from that acres 456123 memory address. ACC 456 wae _/| Indirect Addressing Mode The address of the location where the address of the data is to be found is stored in the instruction as the operand. Thus, the operand is the address of a memory location, which holds the address of the operand. Indirect addressing mode can access a large address space (2"°™"? “0 87" locations). To fetch the operand in this addressing mode, two memory accesses are required. Since memory accesses are slow, this is not efficient for frequent memory accesses. The indirect addressing mode may be used to implement Memory pointers. Example: Ida [[123]] * Eee ie As shown in the figure, the address of (ftessfoonier e123 the memory location that holds the Addressfot data 5 address of the data in the memory is 7a | 456 part of the instruction. , me = ae A Register (Direct) Addressing Mode The operand is contained in a CPU register, and the address of this register is encoded in the instruction. As no memory access is needed, operand fetch is efficient. However, there are only a limited number of CPU registers available, and this imposes a limitation on the use of this addressing mode. Example: Ida R2 This load instruction specifies the address of the register and the operand is fetched from this register. As is clear from the diagram, no memory access is involved in this addressing mode. REGISTER INDIRECT . ADDRESSING MODE IR Memory + In the register indirect mode, the Address of data address of memory location that contains the operand is in a CPU register. The address of this CPU register is encoded in the instruction. A large address space can be accessed using this addressing mode (28 5" locations). It involves fewer memory No memory access needed acct compared to indirect data 30 Advanced Computer Architecture addressing. Example: Ida [R1] Memory The address of the register that IR contains the address of memory Ihe instruction points toa CPOyegister "ESS location holding the operand is Manes encoded in the instruction. There is - ce fy one memory access involved. RB Displacement addressing mode Ra The displacement-addressing mode is CPU Registers also. called based or indexed “°° addressing mode. Effective memory address is calculated by adding a constant (which is usually a part of the instruction) to the value in a CPU register. This addressing mode is useful for accessing arrays. The addressing mode may be called ‘indexed’ in the situation when the constant refers to the first element of the array (base) and the register contains the ‘index’. Similarly, ‘based’ refers to the situation when the constant refers to the offset (displacement) of an array element with respect to the first element. The address of the first element is stored in a register. Example: Ida [R1 + 8] Memory In this example, R1 is the address of ir data the register that holds a memory ager © memory address, which is to be used to Se index re calculate the effective address of the 84 120 1. operand. The constant (8) is added to CPUregisters = this address held by the register and acc [ 458 this effective address is used to retrieve the operand. Relative addressing mode The relative addressing mode is similar to the indexed addressing mode with the exception that the PC holds the base address. This allows the storage of memory operands at a fixed offset from the current instruction and is useful for ‘short’ jumps. Example: jump 4 n (omn P Memory The constant offset (4) is a part of the T aauress othe net instruction, and it is added to the @) instruction address held by the Program Counter. Nest instruction 124 Pc 120 RISC and CISC architectures: Generally, computers can be classified as being RISC machines or CISC machines. These concepts are explained in the following discussion. RISC (Reduced instruction set computers) RISC is more of a philosophy of computer design than a set of architectural features. The underlying idea is to reduce the number and complexity of instructions. However, new 31 Advanced Computer Architecture RISC machines have some instructions that may be quite complex and the number of instructions may also be large. The common features of RISC machines are * One instruction per clock period This is the most important feature of the RISC machines. Since the program execution depends on throughput and not on individual execution time, this feature is achievable by using pipelining and other techniques. In such a case, the goal is issuing an average of one instruction per cycle without increasing the cycle time. * Fixed size instructions Generally, the size of the instructions is 32 bits. ¢ CPU accesses memory only for Load and Store operations This means that all the operands are in the CPU registers at the time these are used in an instruction. For this purpose, they are first brought into the CPU registers from the memory and later stored back through the load and store operation respectively. + Simple and few addressing modes The disadvantage associated with using complex addressing modes is that complex decoding is required to calculate these addresses, which reduces the processor performance as it takes significant time. Therefore, in RISC machines, few simple addressing modes are used. «Less work per instruction As the instructions are simple, less work is done per instruction, and hence the clock period T can be reduced. ¢ Improved usage of delay slots A ‘delay slot’ is the waiting time for a load or store operation to access memory or for a branch instruction to access the target instruction. RISC designs allow the execution of the next instruction after these instructions are issued. If the program or compiler places an instruction in the delay slot that does not depend on the result of the previous instruction, the delay slot can be used efficiently. For the implementation of this feature, improved compilers are required that can check the dependencies of instructions before issuing them to utilize the delay slots. ¢ Efficient usage of Pre-fetching and Speculative Execution Techniques Pre-fetching and speculative execution techniques are used with a pipelined architecture. Instruction pipelining means having multiple instructions in different stages of execution as instructions are issued before the previous instruction has completed its execution; pipelining will be studied in detail later. The RISC machines examine the instructions to check if operand fetches or branch instructions are involved. In such a case, the operands or the branch target instructions can be ‘pre-fetched’. As instructions are issued before the preceding instructions have completed execution, the processor will not know in case of a conditional branch instruction, whether the condition will be met and the branch will be taken or not. But instead of waiting for this information to be available, the branch can be “speculated” as taken or not taken, and the instructions can be issued. Later if the speculation is found to be wrong, the results can be discarded and actual target instructions can be issued. These techniques help improve the performance of processors. CISC (Complex Instruction Set Computers) 32 Advanced Computer Architecture include waits for I/O, instruction fetch times, pipeline delays, etc. The execution time of a program with respect to the processor, is defined as Execution Time = IC x CPI x T Where, IC = instruction count CPI = average number of system clock periods to execute an instruction T =clock period Strictly speaking, (ICxCPI) should be the sum of the clock periods needed to execute each instruction. The manufacturers for each instruction in the instruction set usually provide such information. Using the average is a simplification. MIPS (Millions of Instructions per Second) Another measure of performance is the millions of instructions that are executed by the processor per second. It is defined as MIPS = IC/ (ET x 10°) This measure is not a very accurate basis for comparison of different processors. This is because of the architectural differences of the machines; some machines will require more instructions to perform the same job as compared to other machines. For example, RISC machines have simpler instructions, so the same job will require more instructions. This measure of performance was popular in the late 70s and early 80s when the VAX 11/780 was treated as a reference. MFLOPS (Millions of Floating Point Instructions per Second) For computation intensive applications, the floating-point instruction execution is a better measure than the simple instructions. The measure MFLOPS was devised with this in mind. This measure has two advantages over MIPS: ¢ Floating point operations are complex, and therefore, provide a better picture of the hardware capabilities on which they are run ¢ Overheads (operand fetch from memory, result storage to the memory, etc.) are effectively lumped with the floating point operations they support Whetstones Whetstone is the first benchmark program developed specifically as a benchmark program for performance measurement. Named after the Whetstone Algol compiler, this benchmark program was developed by using the statistics collected during the compiler development. It was originally an Algol program, but it has been ported to FORTRAN, Pascal and C. This benchmark has been specifically designed to test floating point instructions. The performance is stated in MWIPS (millions of Whetstone instructions per second). Dhrystones Developed in 1984, this is a small benchmark program to measure the integer instruction performance of processors, as opposed to the Whetstone’s emphasis on floating point instructions. It is a very small program, about a hundred high-level-language statements, and compiles to about 1~ 1% kilobytes of code. Disadvantages of using Whetstones and Dhrystones Both Whetstones and Dhrystones are now considered obsolete because of the following reasons. 35 Advanced Computer Architecture Small, fit in cache Obsolete instruction mix Prone to compiler tricks Difficult to reproduce results e Uncontrolled source code We should note that both the Whetstone and Dhrystone benchmarks are small programs, which encourage ‘over-optimization’, and can be used with optimizing compilers to distort results. SPEC SPEC, System Performance Evaluation Cooperative, is an association of a number of computer companies to define standard benchmarks for fair evaluation and comparison of different processors. The standard SPEC benchmark suite includes: ¢ Acompiler ¢ A Boolean minimization program e A spreadsheet program ¢ A number of other programs that stress arithmetic processing speed The latest version of these benchmarks is SPEC CPU2000. Advantages It provides for ease of publication. Each benchmark carries the same weight. SPEC ratio is dimensionless. It is not unduly influenced by long running programs. It is relatively immune to performance variation on individual benchmarks. ¢ It provides a consistent and fair metric. An example computer: the SRC: “simple RISC computer” An example machine is introduced here to facilitate our understanding of various design steps and concepts in computer architecture. This example machine is quite simple, and leaves out a lot of details of a real machine, yet it is complex enough to illustrate the fundamentals. SRC Introduction Attributes of the SRC * The SRC contains 32 General Purpose Registers: RO, R1, ..., R31; each register is of size 32-bits. * Two special purpose registers are included: Program Counter (PC) and Instruction Register (IR) * Memory word size is 32 bits * Memory space size is 2** bytes * Memory organization is 2** x 8 bits, this means that the memory is byte aligned * Memory is accessed in 32 bit words ( i.e., 4 byte chunks) * Big-endian byte storage is used 36 Advanced Computer Architecture no 224 Programmer’ s View of the SRC The figure shows the attributes of the SRC; the 32 ,32-bit registers that are a part of the CPU, the two additional CPU registers (PC & IR), and the main memory which is 2” 41. byte cells. SRC Notation We examine the notation used for the SRC with the help of some examples. R[3] means contents of register 3 (R for register) Main memory M[8] means contents of memory location 8 (M for memory) A memory word at address 8 is defined as the 32 bits at address 8,9,10 and 11 in the memory. This is shown in the figure. A special notation for 32-bit memory words is M[8]<31...0>:=M[8]OM[9]© M[L0]©M[I1] © is used for concatenation. Some more SRC Attributes All instructions are 32 bits long (i.e., instruction size is 1 word) All ALU instructions have three operands The only way to access memory is through load and store operations Only a few addressing modes are supported 7 a ‘One memory “word” 2a CM art] MP) 31 24231648 at 0 = 342(_ wig] 3 as3/ wnt] (Mey wey Tarot] 4 Ms Byte Ls Byte + ot 27 6 a Tyne A ‘Op-code unused u 2726 224 0 Type B ‘Op-code a al 34 2726 2291 1716 Q Type C ‘Op-code ta rh c2 4 2726 271 1716 4244 o Type D ‘Op-code ra] oth re «3 37 Advanced Computer Architecture An instance of indexed addressing mode, M [34+R [6]] stores the contents of R8&(R [8]) The ALU instructions are ¢ addi, immediate 2’s complement addition (op-code = 13) o Example: addi R3, R4, 56 R[3] «— R[4]+56 (rb field = R4) ¢ andi, the instruction to obtain immediate logical AND, (op-code = 42 ) o Example: andi R3, R4, 56 R3 is loaded with the immediate logical AND of the contents of register R4 and 56(constant value) ¢ ori, the instruction to obtain immediate logical OR (op-code = 23 ) o Example: ori R3, R4, 56 R3 is loaded with the immediate logical OR of the contents of register R4 and 56(constant value) Note: 1. Since the constant c2 field is 17 bits, = For direct addressing mode, only the first 2'6 bytes of memory can be accessed (or the last 2'° bytes if c2 is negative) = In case of the la instruction, only constants with magnitudes less than +2'° can be loaded = During address calculation using c2, sign extension to 32 bits must be performed before the addition 2. Type C instructions, with some modifications, may also be used for shift instructions. Note 34 27:26 29-1 1716 40 the modification in the following figure. The four shift instructions are e shr is the instruction used to shift the bits right by using value in (5-bit) c3 field(shift count) © —(op-code = 26) o Example: shr R3, R4,7 shift R4 right 7 times in to R3. Immediate addressing mode is used. e — shra, arithmetic shift right by using value in c3 field (op-code = 27) o Example: shra R3, R4,7 This instruction has the effect of shift R4 right 7 times in to R3. Immediate addressing mode is used. ¢ The shl instruction is for shift left by using value in (5-bit) c3 field (op-code = 28) o Example: shl R8, R5, 6 Op-code ra th unused = /eount 40 Advanced Computer Architecture shift R5 left 6 times in to R8. Immediate addressing mode is used. e she, shift left circular by using value in c3 field (op-code = 29) o Example: she R3, R4, 3 shift R4 circular 3 times in to R3. Immediate addressing mode is used. 41 Advanced Computer Architecture Lecture No. 4 ISA and Instruction Formats Reading Material Vincent P. Heuring&Harry F. Jordan Chapter 2 Computer Systems Design and Architecture 2.3, 2.4,slides Summary 1) Introduction to ISA and instruction formats 2) Coding examples and Hand assembly An example computer: the SRC: “simple RISC computer” An example machine is introduced here to facilitate our understanding of various design steps and concepts in computer architecture. This example machine is quite simple, and leaves out a lot of details of a real machine, yet it is complex enough to illustrate the fundamentals. SRC Introduction Attributes of the SRC * The SRC contains 32 General Purpose Registers: RO, R1, ..., R31; each register is of size 32-bits. * Two special purpose registers are included: Program Counter (PC) and Instruction Register (IR) * Memory word size is 32 bits * Memory space size is 2 bytes * Memory organization is 2” x 8 bits, this means that the memory is byte aligned * Memory is accessed in 32 bit = ---G7-------7-- 1 words (i.e., 4 byte chunks) | RO s ° | 7 0 * Big-endian byte storage is used | Ra ! 0 Programmer’s View of the | : I SRC | R31 | The figure below shows the attributes | Register fle ! of the SRC; the 32 ,32-bit registers | ! that are a part of the CPU, the two ! IR | additional CPU registers (PC & IR), | | 22-4 and the main memory which is 2°* 1- | i I byte cells. cPU Main memory 42 Advanced Computer Architecture The contents of the memory location M [56+R [5]] are loaded to the register R3; the rb field # 0. This is an instance of indexed addressing mode. e la is the instruction to load a register with an immediate data value (which can be an address) (op-code = 5 ) o Examplel: la R3, 56 The register R3 is loaded with the immediate value 56. This is an instance of immediate addressing mode. o Example 2: la R3, 56(R5) The register R3 is loaded with the indexed address 56+R [5]. This is an example of indexed addressing mode. ¢ The st instruction is used to store register contents to memory (op-code = 3) o Example 1: st R8, 34 This is the direct addressing mode; the contents of register R8 (R [8]) are stored to the memory location M [34] o Example 2: st R8, 34(R6) An instance of indexed addressing mode, M [34+R [6]] stores the contents of R8&(R [8]) The ALU instructions are ¢ addi, immediate 2’s complement addition (op-code = 13) o Example: addi R3, R4, 56 R[3] < R[4]+56 (rb field = R4) ¢ andi, the instruction to obtain immediate logical AND, (op-code = 42 ) o Example: andi R3, R4, 56 R3 is loaded with the immediate logical AND of the contents of register R4 and 56(constant value) ¢ ori, the instruction to obtain immediate logical OR (op-code = 23 ) o Example: ori R3, R4, 56 R3 is loaded with the immediate logical OR of the contents of register R4 and 56(constant value) Note: 1. Since the constant c2 field is 17 bits, = For direct addressing mode, only the first 2'6 bytes of memory can be accessed (or the last 2'° bytes if c2 is negative) = In case of the la instruction, only constants with magnitudes less than +2'° can be loaded 45 Advanced Computer Architecture = During address calculation using c2, sign extension to 32 bits must be performed before the addition 2. Type C instructions, with some modifications, may also be used for shift instructions. Note the modification in the following figure. The four shift instructions are 3 27 26 29 91 1716 4 0 e shr is the instruction used to Op-code ta rh unused — [count shift the bits right by using value in (5-bit) c3 field(shift count) (op-code = 26) o Example: shr R3, R4,7 shift R4 right 7 times in to R3 and shifts zeros in from the left as the value is shifted right. Immediate addressing mode is used. e — shra, arithmetic shift right by using value in c3 field (op-code = 27) o Example: shra R3, R4,7 This instruction has the effect of shift R4 right 7 times in to R3 and copies the msb into the word on left as contents are shifted right. Immediate addressing mode is used. ¢ The shl instruction is for shift left by using value in (5-bit) c3 field (op-code = 28) o Example: shl R8, R5, 6 shift R5 left 6 times in to R8 and shifts zeros in from the right as the value is shifted left. Immediate addressing mode is used. e she, shift left circular by using value in c3 field (op-code = 29) o Example: she R3, R4, 3 shift R4 circular 3 times in to R3 and copies the value shifted out of the register on the left is placed back into the register on the right. Immediate addressing mode is used. Type D Type D_ includes four ALU 43 272622 31 1716 1214 o instructions, four register based shift Op-code ra th re unused instructions, two logical instructions and two branch instructions. The four ALU instructions are given below e add, the instruction for 2’s complement register addition (op-code = 12) o Example: add R3, R5, R6 result of 2’s complement addition R[5] + R[6] is stored in R3. Register addressing mode is used. ¢ sub, the instruction for 2’s complement register subtraction (op-code = 14) o Example: sub R3, R5, R6 R3 will store the 2’s complement subtraction, R[5] - R[6]. Register 46 Advanced Computer Architecture addressing mode is used. ¢ and, the instruction for logical AND operation between registers (op-code = 20) o Example: and R8, R3, R4 R8 will store the logical AND of registers R3 and R4. Register addressing mode is used. ¢ or ,the instruction for logical OR operation between registers (op-code = 22) o Example: or R8, R3, R4 R8 is loaded with the value R[3] v R[4], the logical OR of registers R3 and R4. Register addressing mode is used. The four register based shift instructions use register addressing mode. These use a modified form of type D, as shown in figure 31 2726 2221 1716 1241 64 0 e shr, shift right by using value Op-code ra th re Junused jooo00| in register rc (op-code = 26) o Example: shr R3, R4, RS This instruction will shift R4 right in to R3 using number in RS ¢ — shra, the arithmetic shift right by using register rc (op-code = 27) o Example: shra R3, R4, RS A shift of R4 right using R5, and the result is stored in R3 e — shl is shift left by using register rc (op-code = 28) o Example: shl R8, R5, R6 The instruction shifts R5 left in to R8 using number in R6 e she, shifts left circular by using register re (op-code = 29) o Example: she R3, R4, R6 This instruction will shift R4 circular in to R3 using value in R6 The two logical instructions also use a modified form of the Type D, and are the following. o neg stores the 2’s complement 31 2726 2221 1718 1211 Q of register re in ra (op-code = Op-code ra lunused! re unused 15) o Example: neg R3, R4 Negates (obtains 2’s complement) of R4 and stores in R3. 2-address format and register addressing mode is used. ¢ not stores the 1’s complement of register re in ra (op-code = 24) o Example: not R3, R4 Logically inverts R4 and stores in R3. 2-address format with register 47 Advanced Computer Architecture add R6, R4, RS 3 R6 - . Hinemonie [si] 30 a contains (a+b) CLE lar ToPoray to sh] R8, R6, 2 3 R8& 14 of opal oi contains 4(a+b) add tT) 1) t,o) 0 lar 7 $ $ i ¢ sub R9, R7, R8 ; the aaa fool tae ee result is in R9 and PU OO) nee tt of oo st R9, z store the |“ Fe atte result in memory location z pri Polaborala ahh Note: brimi [oo] i/o} o} 4 she iff itoti The memory labels a, b, c and z can be briny [0] 1 of ol 1 shl i i i : defined by using assembler directives was 7 i ; ; i 2 anther like .dw or .db, etc. in the source file. hit thao; ord hr if ifop ito A semicolon ‘;’ is used for comments ami ofa} o} ol o shra ' ' 7 ' t in assembly language. ee shee ooh Solution B: ae 7 1 z z z top Jt ifa We may solve the problem by ae hE 7 1 + + assuming that a multiply instruction, similar to the add instruction, exists in the instruction set of the SRC. The shl instruction will be replaced by the mul instruction as given below. IdRlI,c 3c is a label used for a memory location addi R3, R1, 58 ; R3 contains (c+58) mul R7, R3, 4 : R7 contains 16(c+58) Id R4,a Id R5,b add R6, R4, R5 ; R6 contains (a+b) mul R8, R6, 2 ; R8 contains 4(a+b) sub R9, R7, R8 ; the result is in RO st RY, z ; store the result in memory location z Note: The memory labels a, b, c and z can be defined by using assembler directives like .dw or .db, etc. in the source file. Solution C: We can perform multiplication with a multiplier that is not a power of 2 by doing addition in a loop. The number of times the loop will execute will be equal to the multiplier. Example 2: Hand Assembly Convert the given SRC assembly language program in to an equivalent SRC machine language program. IdRlI,c 3c is a label used for a memory location addi R3, R1, 58 ; R3 contains (c+58) shl R7, R3,4 ; R7 contains 16(c+58) Id R4,a 50 Advanced Computer Architecture Id R5,b add R6, R4, R5 ; R6 contains (a+b) sh] R8, R6, 2 ; R8 contains 4(a+b) sub R9, R7, R8 ; the result is in RO st RY, z ; store the result in memory location z Note: This program uses memory labels a,b,c and z. We need to define them for the assembler by using assembler directives like .dw or .equ etc. in the source file. Assembler Directives Assembler directives, also called pseudo op-codes, are commands to the assembler to direct the assembly process. The directives may be slightly different for different assemblers. All the necessary directives are available with most assemblers. We explain the directives as we encounter them. More information on assemblers can be looked up in the assembler user manuals. Source program with directives ORG 200 _ ;start the next line at address 200 a: .DW 1 ; reserve one word for the label a in the memory b: .DW 1 ; reserve a word for b, this will be at address 204 cr .DW 1 ; reserve a word for c, will be at address 208 Zi .DW 1 3 reserve one word for the result ORG 400 ; start the code at address 400 ; all numbers are in decimal unless otherwise stated IdRlI,c 3c is a label used for a memory location addi R3, R1, 58 ; R3 contains (c+58) shl R7, R3,4 ; R7 contains 16(c+58) Id R4,a Id R5,b add R6, R4, R5 ; R6 contains (a+b) sh] R8, R6, 2 ; R8 contains 4(a+b) sub R9, R7, R8 ; the result is in RO st RY, z ; store the result in memory location z This is the way an sembly program will appear in the source file. Most assemblers require that the file Label | Address | Value be saved with an .asm extension. a 200 unknown Solution: ; b 204 unknown Observe the first line of the program c 208 | unknown -ORG 200 3 Start the z 212 unknown next line at address 200 This is a directive to let the following code/ variables ‘originate’ at the specified address of the memory, 200 in this case. Variable statements, and another .ORG directive follow the .ORG directive. a: .DW 1 ; reserve one word for the label a in the memory b: .DW 1 ; reserve a word for b, this will be at address 204 cr .DW 1 ; reserve a word for c, will be at address 208 51 Advanced Computer Architecture Zi .DW 1 3 reserve one word for the result -ORG 400 ; start the code at address 400 We conclude the following from the above statements: The code starts at address 400 and each instruction takes 32 bits in the memory. The memory map for the program is shown in given table. Memory Map for the SRC example program Menor Meno Address Contents 200 hndeonn an+ hmirown, 208 hmiromn, 212 hmiromn, 400 MRL< 404 vddi RS, E153 408 sRLETRG,4 412 MRA a 416 RSE 420 add RO,R4, 5 aa sHLRS,R6, 2 428 sub RORY, ES 432 LBZ We have to convert these instructions to machine language. Let us start with the first instruction: Id R1, c Notice that this is a type C instruction with the rb woe ‘Memory Menoxy Hexadecimal field missing. Address Contents Memeay Contents 1. We pick the op-code for this load instruction from the SRC _ instruction 300 | unicorn tables given in the SRC _ instruction 204 [uizown summary section. The op-code for the m8 [pizow load register ‘ld’ instruction is 00001. 22 [eererm 2. Next we pick the register code wa ume TEE corresponding to register RI from the mae register table (given in the section 7 MRTG ‘encoding of general purpose registers’). +a [WR a The register code for R1 is 00001. vis [RS 3. The rb field is missing, so we place zeros $20 [sad Ro, ARS in the field: 00000 “4 |aRREE2 4. The value of c is provided by the eS assembler, and should be converted to 17 52 Advanced Computer Architecture Memory map is then updated with this value. The Maroy Vemoy Hexaicinal Maroy Menow Hexadecinel ‘Address Contents ‘Memway Contents Adbess Contets ‘Memory Contents 200 [uriaom 200 [dacnm Ee as [mtn a moet aa [wizowm 7a [wtaewm wae eno waa TeAODOR 1a Eman io (aa Seas 1a _~(aUR lcsmnoeh (aL Eicaomnah a [ee CHROOT [aR eon000c8 ae [ae oosneoeTH ae [aR Toaneceh a) paaRA RAR eress000R (MaRS Go ae [RUBBED ocwNak [RID ocoATA a8 |abRO.RD. BB Faso, ee TaABEOOD mm [aa TAO00DH er [anes instruction sh] R8,R6, 2 is a type C instruction with the rc field missing. The steps taken to obtain the machine code of the instruction are 1. The op-code of the shift left instruction ‘shl’, obtained from the SRC instruction table, is 11100 2. The register codes of R8 and R6 are 01000 and 00110 respectively 3. Binary code is used for the immediate data 2: 00000 0000 0000 0010 4. The complete instruction becomes: 11100 01000 00110 00000 0000 0000 0010 5. The hexadecimal equivalent of the instruction is E 2 0 C 0002 Memory map is then updated with this value. The instruction at the memory address 428 is sub R9, R7, R8. This is a type D instruction. We decode it into the machine language, as follows: 1. The op-code of the subtract instruction ‘sub’ is 01110 2. The register codes of R9, R7 and R8, obtained from the register table, are 01001, 00111 and 01000 respectively 3. The 12 bit immediate data field is not used, zeros are encoded in its place: 0000 0000 0000 4. The complete instruction becomes: 01110 01001 OO111 01000 0000 0000 0000 5. The hexadecimal equivalent is724E8000h 55 Advanced Computer Architecture We again update the memory map The last instruction is is a type C instruction with the rb field missing: st R9, z The machine equivalent of this instruction is obtained through the following steps: 1. 2. 3. The op-code of the store instruction ‘st’, obtained from the SRC instruction table, is 00011 The register code of R9 is 01001 Notice that there is no register coded in the 5 bit rb field, therefore, we encode zeros: 00000 The value of the label z is provided by the assembler, and should be converted to 17 bits. Notice that the memory address assigned to z is 212. The 17 bit binary equivalent is: 00000 0000 1101 0100 . The complete instruction becomes: 00011 01001 00000 00000 0000 1101 0100 6. The hexadecimal form of this instruction is 1A 4000D4h The memory map, after the conversion of all the instructions, is We have shown the memory map as an array of 4 byte cells in the above solution. However, since the memory of the SRC is arranged in 8 bit cells (i.e. memory is byte aligned), the real representation of the memory map is : Example 3: SRC instruction analysis Identify the formats of following SRC instructions and specify the values in the fields Solution: Instruction format fra fm fic jel fed‘ c3 negal, 12 > Ee add 20,12,13 » mo[ ns 8 - - - nop a -~{-1f-]{/-]-]- 1d2,6 c a yoy - 7s] -]- shi 10, 11,3 e woyod - - - 3 Instruction format fra xo fac for for [ea negel, 2 add 10.2223 nap 1422.6 shia, r1,3, 56 Advanced Computer Architecture Lecture No. 5 Description of SRC in RTL Reading Material Handouts Slides Summary 1) Reverse Assembly 2) Description of SRC in the form of RTL 3) Behavioral and Structural description in terms of RTL Reverse Assembly Typical Problem: Given a machine language instruction for the SRC, it may be required to find the equivalent SRC assembly language instruction Example: Reverse assemble the following SRC machine language instructions: 68C2003A h E1C60004 h 61885000 h 724E8000 h 1A4000D4 h 084000D0 h Solution: 1. Write the given hexadecimal instruction in binary form 68C2003A h — 0110 1000 1100 0010 0000 0000 0011 1010 b 2. Examine the first five bits of the instruction, and pick the corresponding mnemonic from the SRC instruction set listing arranged according to ascending order of op-codes 01101 b > 13 d — addi — add immediate 3. Now we know that this instruction uses the type C format, the two 5-bit fields after the op-code field represent the destination and the source registers respectively, and that the remaining 17-bits in the instruction represent a constant 0110 ihoo 10 oorp 0000 0000 0011 1010 b op-code'ra field' rb field 17-bit cl field 4 dod L addi R3 RI 3A h=58 d 57 Advanced Computer Architecture Language. It does not create a new register; it just generates another name, or “alias” for an already existing register or part of a register. For example, Op<4..0>: = IR<31..27> means that the five most significant bits of the register IR will be called op, with bits 4..0. Fields in the SRC instruction In this section, we examine the various fields of an SRC instruction, using the RTL. op<4..0>: = IR<31..27>; operation code field The five most significant bits of an SRC instruction, (stored in the instruction register in this example), are named op, and this field is used for specifying the operation. ra<4,.0>: = IR<26..22>; __ target register field The next five bits of the SRC instruction, bits 26 through 22, are used to hold the address of the target register field, i.e., the result of the operation performed by the instruction is stored in the register specified by this field. rb<4..0>: = IR<21..17>; operand, address index, or branch target register The bits 21 through 17 of the instruction are used for the rb field. rb field is used to hold an operand, an address index, or a branch target register. re<4,.0>: = IR<16..12>; second operand, conditional test, or shift count register The bits 16 through 12, are the re field. This field may hold the second operand, conditional test, or a shift count. c1<21..0>: = IR<21..0>; long displacement field In some instructions, the bits 21 through 0 may be used as long displacement field. Notice that there is an overlap of fields. The fields are distinguished in a particular instruction depending on the operation. c2<16..0>: = IR<16..0>; short displacement or immediate field The bits 16 through 0 may be used as short displacement or to specify an immediate operand. ¢3<11..0>: = IR<11..0>; count or modifier field The bits 11 through 0 of the SRC instruction may be used for count or modifier field. Describing the processor state using RTL The Register Transfer Language can be used to describe the processor state. The following registers and bits together form the processor state set. PC<31..0>; program counter (it holds the memory address of next instruction to be executed) IR<31..0>; instruction register, used to hold the current instruction Run; one bit run/halt indicator Strt; start signal R [0..31]<31..0>; 32, 32 bit general purpose registers SRC in a Black Box 60 Advanced Computer Architecture Indicators (include the RUN indicator) Other switches may be added later on Connectors at the back (to be added later on) Difference between our notation and notation used by the text (H&J) Seis Meaning Syobak Our Symbol or Meaning Symbol used : Conditional transfer > terminology by H&d RTL Register Transfer Language | RTN ;__[Besventol statements i Behavioral RTL Hosttact RIN |_| Conounent statements : Structwal RTL Concrete RTN = Naming operator = imaplementation Micro <— | Assignment 7 architecture & Logical AND a MAR Memory Address Register MA — ieee OR y MER _| Memory Buffer Register MD 1 [Logical NOT a © | Gonedenation # a | Replication @ % | Remainder after division (modal) none Difference between “,” and “;” in RTL Statements separated by a “,” take place during the same clock pulse. In other words, the order of execution of statements separated by “,” does not matter. On the other hand, statements separated by a “;” take place on successive clock pulses. In other words, if statements are separated by “;” the one on the left must complete before the one on the right starts. However, some things written with one RTL statement can take several clocks to complete. So in the instruction interpretation, fetch-execute cycle, we can see that the first statement. ! Run & Strt : Run — 1, executes first. After this statement has executed and set run to 1, the statements IR — M [PC] and PC <— PC +4 are executed concurrently. Note that in statements separated by “,”, all nght hand sides of Register Transfers are evaluated before any left hand side is modified (generally though assignment). 61 Advanced Computer Architecture Using RTL to describe the dynamic properties of the SRC The RTL can be used to describe the dynamic properties. Conditional expressions can be specified through the use of RTL. The following example will illustrate this (op=14): R [ra] — R [rb] - R[rc]; The <— operator is the RTL assignment operator. ‘;’ is the termination operator. This conditional expression implies that “IF the op field is equal to 14, THEN calculate the difference of the value in the register specified by the rb field and the value in the register specified by the re field, and store the result in the register specified by the ra field.” Effective address calculations in RTL (performed at runtime) In some instructions, the address of an operand or the destination register may not be specified directly. Instead, the effective address may have to be calculated at runtime. These effective address calculations can be represented in RTL, as illustrated through the examples below. Displacement address disp<31..0> <= ((rb=0) : c2<16..0> {sign extend}, (rb40) : R [rb] + c2<16..0> {sign extend}), The displacement (or the direct) address is being calculated in this example. The “,” operator separates statements in a single instruction, and indicates that these statements are to be executed simultaneously. However, since in this example these are two disjoint conditions, therefore, only one action will be performed at one time. Note that register RO cannot be added to displacement. rb = 0 just implies we do not need to use the R [rb] field. Relative address rel<31..0> := PC<31..0> + c1<21..0> {sign extend}, In the above example, a relative address is being calculated by adding the displacement after sign extension to the contents of the program counter register (that holds the next instruction to be executed in a program execution sequence). Range of memory addresses The range of memory addresses that can be accessed using the displacement (or the direct) addressing and the relative addressing is given. ¢ Direct addressing (displacement with rb=0) o If c2<16>=0 (positive displacement) absolute addresses range from 00000000h to OOOOFFFFh o If c2<16>=1 (negative displacement) absolute addresses range from FFFF0000h to FFFFFFFFh e Relative addressing o The largest positive value of C1<21..0> is 27'-1 and its most negative value is 271 so addresses up to 27!.1 forward and 27! backward from the current PC value can be specified Instruction Interpretation (Describing the Fetch operation using RTL) The action performed for all the instructions before they are decoded is called ‘instruction interpretation’. Here, an example is that of starting the machine. If the machine is not 62 Advanced Computer Architecture Arithmetic and Logical instructions (op<4..0>=12) : R [ra] — R [rb] + R [re], If the op-code is 12, the contents of the registers rb and re are added and the result is stored in the register ra. (op<4..0>=13) : R [ra] — R [rb] + c2<16..0> {sign extend}, If the op-code is 13, the content of the register rb is added with the immediate data in the field c2, and the result is stored in the register ra. (op<4..0>=14) : R [ra] — R [rb]- R [re], If the op-code is 14, the content of the register re is subtracted from that of rb, and the result is stored in ra. (op<4..0>=15) : R [ra] — -R [re], If the op-code is 15, the content of the register rc is negated, and the result is stored in ra. (op<4..0>=20) : R [ra] — R [rb] & R [rc], If the op field equals 20, logical AND of the contents of the registers rb and re is obtained and the result is stored in register ra. (op<4..0>=21) : R [ra] — R [rb] & c2<16..0> {sign extend}, If the op field equals 21, logical AND of the content of the registers rb and the immediate data in the field c2 is obtained and the result is stored in register ra. (op<4..0>=22) : R [ra] — R [rb] ~ R [re], If the op field equals 22, logical OR of the contents of the registers rb and rc is obtained and the result is stored in register ra. (op<4..0>=23) : R [ra] — R [rb] ~ c2<16..0> {sign extend}, If the op field equals 23, logical OR of the content of the registers rb and the immediate data in the field c2 is obtained and the result is stored in register ra. (op<4..0>=24) : R [ra] — -R [re], If the op-code equals 24, the content of the logical NOT of the register re is obtained, and the result is stored in ra. Shift instructions (op<4..0>=26): R [ra]<31..0 > — (na. 0) © R [rb] <31..n>, If the op-code is 26, the contents of the register rb are shifted right n bits times. The bits that are shifted out of the register are discarded. Os are added in their place, i.e. n number of Os is added (or concatenated) with the register contents. The result is copied to the register ra. (op<4..0>=27) : R [ra]<31..0 > — (na R [rb] <31>) © R [rb] <31.m, For op-code 27, shift arithmetic operation is carried out. In this operation, the contents of the register rb are shifted right n times, with the most significant bit, bit 31, of the register rb added in their place. The result is copied to the register ra. (op<4..0>=28) : R [ra]<31..0 > — R [rb] <31-n..0> © (na. 0), For op-code 28, the contents of the register rb are shifted left n bits times, similar to the shift right instruction. The result is copied to the register ra. (op<4..0>=29) : R [ra]<31..0 > — R [rb] <31-n..0> © R [rb]<31..32-n >, The instruction corresponding to op-code 29 is the shift circular instruction. The contents of the register rb are shifted left n times, however, the bits that move out of the register in the shift process are not discarded; instead, these are shifted in from the other end (a 65 Advanced Computer Architecture circular shifting). The result is stored in register ra. where n:=( (c3<4..0>=0) : R [re], (c3<4..0>!=0) : ¢3 <4..0> ), Notation: @ means replication © Means concatenation Miscellaneous instructions (op<4..0>= 0) , No operation (nop) If the op-code is 0, no operation is carried out for that clock period. This instruction is used as a stall in pipelining. (op<4..0>= 31) : Run — 0, Halt the processor (Stop) ) iF ); If the op-code is 31, run is set to 0, that is, the processor is halted. After one of these disjoint instructions is executed, iF, i.e. instruction Fetch is carried out once again, and so the fetch-execute cycle continues. Flow diagram Flow diagram is the — symbolic representation of Fetch-Execute cycle. Its top block indicates instruction fetch and then next block shows the instruction decode by looking at the first 5-bits of the fetched instruction which would represent op-code which may be from 0 to 31.Depending upon the contents of this op-code the appropriate processing would take place. After the appropriate processing, we would move back to top block, next instruction is fetched and the Instruction Fetch Instruction Decode Op-code = 31 opcode =0 Op-code =30 Op-cade appropriate processing goes in this place same process is repeated until the instruction with op-code 31 would reach and halt the system. Note:For SRC Assembler and Simulator consult Appendix. 66 Advanced Computer Architecture Lecture No. 6 RTL Using Digital Logic Circuits Reading Material Handouts Slides Summary ¢ Using Behavioral RTL to Describe the SRC (continued) e Implementing Register Transfer using Digital Logic Circuits Using behavioral RTL to Describe the SRC (continued) Once the instruction is fetched and the PC is incremented, execution of the instruction starts. In the following discussion, we denote instruction fetch by “iF” and instruction execution by “iE”. iE:= ( (op<4..0>= 1) : R [ra] — M [disp], (op<4..0>= 2) : R [ra] — M [rel], (op<4..0>=31) : Run — 0,); iF); As shown above, instruction execution can be described by using a long list of conditional operations, which are inherently “disjoint”. Only one of these statements is executed, depending on the condition met, and then the instruction fetch statement (iF) is invoked again at the end of the list of concurrent statements. Thus, instruction fetch (iF) and instruction execution statements invoke each other in a loop. This is the fetch-execute cycle of the SRC. Concurrent Statements The long list of concurrent, disjoint instructions of the instruction execution (iE) is basically the complete instruction set of the processor. A brief overview of these instructions is given below: Load-Store Instructions (op<4..0>= 1) : R [ra] — M [disp], load register (Id) 67 Advanced Computer Architecture (op<4..0>=27) : R [ra]<31..0 > — (na R [rb] <31>) © R [rb] <31.m, For op-code 27, shift arithmetic operation is carried out. In this operation, the contents of the register rb are shifted right n times, with the most significant bit, i.e., bit 31, of the register rb added in their place. The result is copied to the register ra. (op<4..0>=28) : R [ra]<31..0 > — R [rb] <31-n..0> © (na. 0), For op-code 28, the contents of the register rb are shifted left n bits times, similar to the shift right instruction. The result is copied to the register ra. (op<4..0>=29) : R [ra]<31..0 > — R [rb] <31-n..0> © R [rb]<31..32-n >, The instruction corresponding to op-code 29 is the shift circular instruction. The contents of the register rb are shifted left n times, however, the bits that move out of the register in the shift process are not discarded; instead, these are shifted in from the other end (a circular shifting). The result is stored in register ra. where n := ( (c3<4..0>=0) : R [re], (c3<4..0>!=0) : ¢3 <4..0> ), Notation: @ means replication © means concatenation Miscellaneous instructions (op<4..0>= 0) , No operation (nop) If the op-code is 0, no operation is carried out for that clock period. This instruction is used as a stall in pipelining. (op<4..0>= 31): Run — 0, Halt the processor (Stop) )3 iF ); If the op-code is 31, run is set to 0, that is, the processor stops execution. After one of these disjoint instructions is executed, iF, i.e. instruction Fetch is carried out once again, and so the fetch-execute cycle continues. Implementing Register Transfers using Digital Logic Circuits We have studied the register transfers in the previous sections, and how they help in implementing assembly language. In this section we will review how the basic digital logic circuits are used to implement instructions register transfers. The topics we will cover in this section include: A brief (and necessary) review of logic circuits Implementing simple register transfers Register file implementation using a bus Implementing register transfers with mathematical operations The Barrel Shifter Implementing shift operations ANAPYWNS Review of logic circuits 70 Advanced Computer Architecture Before we study the implementation of register transfers using logic circuits, a brief overview of some of the important logic circuits will prove helpful. The topics we review in this section include 1. The basic D flip flop 2. The n-bit register 3. The n-to-1 multiplexer Data Input Q Output 4. Tri-state buffers D Q Enable Input — EN Clock Input The basic D flip flop —e R A flip-flop is a bi-stable device, capable of storing one bit of Information. Therefore, flip- Active Low Clear Input flops are used as the building blocks of a computer’s memory well as CPU registers. There are various types of flip-flops; most common type, the D flip-flop is shown in the figure given. The given truth table for this positive-edge triggered D flip-flop shows that the flip-flop is set (i.e. stores a 1) when the data input is high on the leading (also called the positive) edge of the clock; it is reset (i.e., the flip-flop stores a 0) when the data input is 0 on the leading edge of the clock. The clear EN input will reset the flip-flop on a low input. 0 The n-bit register A n-bit register can be formed by 4 1 D flip flop as grouping n flip-flops together. So a register is a device in which a group of flip-flops operate synchronously. A register is useful for storing binary data, as each flip-flop can store one Truth table: D Flip Flop bit. The clock input of the flip-flops is grouped together, as is the enable input. As shown in the figure, using the input lines a binary number can be stored in the register by applying the corresponding logic level to each of the flip-flops simultaneously at the positive edge of the clock. D x 0 1 71 Advanced Computer Architecture ae o- En n-bit register The next figure shows the symbol of a 4-bit register used for an integrated circuit. In0 through In3 are the four input lines, OutO through Out3 are the four output lines, Clk is the clock input, and En is the enable line. To get a better understanding of this register, consider the situation where we want to store the binary number 1000 in the register. We will apply the number to the input lines, as shown in the figure given. ut cock ee q sey Enatl e 1 Hoo Of) OU gy Ou test circuit for 4-bit register In -— i In — Ind Clk 4-bit En a 2 Ra outs ut aut Sout 4-bit Register Symbol On the leading edge of the clock, the number will be stored in the register. The enable input has to be high if the number is to be stored into the register. 72 Advanced Computer Architecture — Ind Out Int Gut) SJ Ine Out? — Ins Outs - om: Lu Tri-state buffer symbol We can see that when the enable input (or the control input) c is low (0), the output is high impedance Z. The symbol of a 4-bit tri-state buffer unit is shown in the figure. There are four input lines, an equal number of output lines, and an enable line in this unit. If we apply a high on the input 3 and 2, and a low on input | and 0, we get the output 1100, only when the enable input is high, as shown in the given figure. 75 Advanced Computer Architecture In Quto OL tint ut 1 In? Out? o- In3 Outs 1, io Q- 1 1 a a 1-— o- Test circuit for Tri-state buffer Implementing simple register transfers We now build on our knowledge of the primitive logic circuits to understand how register transfers are implemented. In this section we will study the implementation of the following e Simple conditional transfer Concept of control signals Two-way transfers Connecting multiple registers Buses Bus implementations Simple conditional transfer In a simple conditional transfer, a condition is checked, and if it is true, the register transfer takes place. Formally, a conditional transfer is represented as Cond: RD <— RS This means that if the condition ‘Cond’ is true, the contents of the register named RS (the source register) are copied to the register RD (the destination register). The following figure shows how the registers may be interconnected to achieve a conditional transfer. In this circuit, the output of the source register RS is connected to the input of the destination registers RD. However, notice that the transfer will not take place unless the enable input of the destination register is activated. We may say that the ‘transfer’ is being controlled by the enable line (or the control signal). Now, we are able to control the transfer by selectively enabling the control signal, through the use of other combinational logic that may be the equivalent of our condition. The condition is, in general, a Boolean 76 Advanced Computer Architecture expression, and in this example, the condition is equivalent to LRD =1. Two-way transfers In the above example, only one-way transfer was possible, i.e., we could only copy the contents of RS to RD if the condition was met. In order to be able to achieve two-way transfers, we must also provide a path from the output of the register RD to input of register RS. This will enable us to implement Input valuee to AS t- _ es fe Suet Et 4b reg RS clock | aaa if gree Bk 4-bit reg FO 2- Jiro " glee o- a a Conditional Trasnfer Cond1: RD <— RS Cond2: RS — RD Connecting multiple registers We have seen how two registers can be connected. However, in a computer we need to connect more than just two registers. In order to connect these registers, one may argue that a connection between the input and output of each be provided. This solution is shown for a scenario where there are 5 registers that need to be interconnected. We can see that in this solution, an m-bit register requires two connections of m-wires each. Hence five m-bit registers in a “point-to-point” scheme require 20 connections; each with m wires. In general, n registers in a point to point scheme require n (n-1) connections. It is quite obvious that this solution is not going to scale well for a large number of registers, as is the case in real machines. The solution to this problem is the use of a bus architecture, which is explained in the following sections. 77 Advanced Computer Architecture Lem ' Lay Lari We now take a look at the steps taken for the (conditional, mathematical) transfer (ope=1): R4— R3 + R2. First of all, if the condition ope = | is met, the contents of the first operand register, R3, are Time step | Operation to be performed | Control signals tobe (structural RTL) activated 1 ACES LA, F3out 2 Ceo a+k2 LC, R2out 3 Rae LR4, Cout Structural RTL: add operation 80 Advanced Computer Architecture transferred to the temporary register A through the bus. This is done by activating R3out. It lets the contents of the register R3 to be loaded on the bus. At the same time, applying a logical high input to LA enables the load for the register A. This lets the binary number on the bus (the contents of register R3) to be loaded into the register A. The next step is to enable R2out to load the contents of the register R2 onto the bus. As can be observed from the figure, the output of the register A is one of the inputs to the 4-bit adder; the other input to the adder is the bus itself. Therefore, as the contents of register R2 are loaded onto the bus, both the operands are available to the adder. The output can then be stored to the register RC by enabling its write. So a high input is applied to LC to store the result in register RC. The third and final step is to store (transfer) the resultant number in the destination register R4. This is done by enabling Cout, which writes the number onto the bus, and then enabling the read of the register R4 by activating the control signal to LR4. These steps are summarized in the given table. The barrel shifter Shift operations are frequently used operations, as shifts can be used for the implementation of multiplication and division etc. A bi-directional shift register with a parallel load capability can be used to perform shift operations. However, the delays in such structures are dependent on the number of shifts that are to be performed, e.g., a 9 bit shift requires nine clock periods, as one shift is performed per clock cycle. This is not an optimal solution. The barrel shifter is an alternative, with any number of shifts accomplished during a single clock period. Barrel shifters are constructed by using multiplexers. An n-bit barrel shifter is a combinational circuit implemented using n multiplexers. The barrel provides a shifted copy of the input data at its output. Control inputs are provided to specify the number of times the input data is to be shifted. The shift process can be a simple one with Os used as fillers, or it can be a rotation of the input data. The corresponding figure shows a barrel shifter that shifts right the input data; the number of shifts depends on the bit pattern applied on the control inputs SO, S1. The function table for the barrel shifter is given. We see from the table that in order to apply single shift to the input number, the control signal is 01 on (S1, SO), which is the binary equivalent of the decimal number 1. Similarly, to apply 2 shifts, control signal 10 81 Advanced Computer Architecture Barrel Shifter (on S1, SO) is applied; 10 is the binary equivalent of the decimal number 2. A control input of 11 shifts the number 3 places to the right. | | | | Now we take a look at an example of the shift operation being implemented oof s+ 2a through the use of the barrel shifter: 3451 —- -— - R4< ror R3 (2 times); Bshifter4 The shift functionality can be Ce ME Se incorporated into the register file circuit oss 5s 5 8S with the bus architecture we have been oo 0 8 building, by introducing the barrel T T T T shifter, as shown in the given figure. To perform the operation, Barrel Shifter Symbol R4< ror R3 (2 times), the first step is to activate R3out, nbl and LC. Activating R3out will load the contents of the register R3 onto the bus. Since the bus is directly connected to the input of the barrel shifter, this number is applied to the input side. nb1 and nb0 are the barrel shifter’s control lines for specifying the number of shifts to be applied. Applying a high input to nb1 and a low input to nbO will shift the number two places to the right. Activating LC will load the shifted output of the barrel shifter into the register C. 82 Advanced Computer Architecture Lecture No. 7 Design Process forISA of FALCON-A Reading Material Hnadouts Slides Summary 8) Outline of the thinking process for ISA Design 9) Introduction to the ISA of FALCON-A Instruction Set Architecture (ISA) Design: Outline of the thinking process In this module we will learn to appreciate, understand and apply the approach adopted in designing an instruction set architecture. We do this by designing an ISA for a new processor. We have named our processor FALCON-A, which is an acronym for First Architecture for Learning Computer Organization and Networks (version A). The term Organization is intended to include Architecture and Design in this acronym. Elements of the ISA Before we go onto designing the instruction set architecture for our processor FALCON- A, we need to take a closer look at the defining components of an ISA. The following three key components define any instruction set architecture. 1. The operations the processor can execute 2. Data access mode for use as operands in the operations defined 3. Representation of the operations in memory We take a look at all three of the components in more detail, and wherever appropriate, apply these steps to the design of our sample processor, the FALCON-A. This will help us better understand the approach to be adopted for the ISA design of a processor. A more detailed introduction to the FALCON-A will be presented later. The operations the processor can execute All processors need to support at least three categories (or functional groups) of instructions — Arithmetic, Logic, Shift — Data Transfer —Control ISA Design Steps — Step 1 We need to think of all the instructions of each type that ought to be supported by our processor, the FALCON-A. The following are the instructions that we will include in the ISA for our processor. 85 Advanced Computer Architecture Arithmetic: add, addi (and with an immediate operand), subtract, subtract-immediate, multiply, divide Logic: and, and-immediate, or, or-immediate, not Shift: shift left, shift right, arithmetic shift right Data Transfer: Data transfer between registers, moving constants to registers, load operands from memory to registers, store from registers to memory and the movement of data between registers and input/output devices Control: Jump instructions with various conditions, call and return from subroutines, instructions for handling interrupts Miscellaneous instructions: Instructions to clear all registers, the capability to stop the processor, ability to “do nothing”, etc. ISA Design Steps — Step 2 Once we have decided on the instructions that we want to add support for in our processor, the second step of the ISA design process is to select suitable mnemonics for these instructions. The following mnemonics have been selected to represent these operations. Arithmetic: add, addi, sub ,subi ,mul ,div Logic: and, andi, or, ori, not Shift: shiftl, shiftr, asr Data Transfer: load, store, in, out, mov, movi Control: jpl, jmi, jnz, jz, jump, call, ret, int.iret Miscellaneous instructions: nop, reset, halt ISA Design Steps — Step 3 The next step of the ISA design is to decide upon the number of bits to be reserved for the op-code part of the instructions. Since we have 32 instructions in the instruction set, 5 bits will suffice (as 2° =32) to encode these op-codes. ISA Design Steps — Step 4 The fourth step is to assign op-codes to these instructions. The assigned op-codes are shown below. Arithmetic: add (0), addi (1), sub (2), subi (3), mul (4),div (5) Logic: 86 Advanced Computer Architecture and (8), andi (9), or (10), ori (11), not (14) Shift: shiftl (12), shiftr (13), asr (15) Data Transfer: load (29), store (28), in (24), out (25), mov (6), movi (7) Control: jp! (16), jmi (17), jnz (18), jz (19), jump (20), call (22), ret (23), int (26), iret (27) ooo | add |] oron0 | and |) ton00 | pt 11000 | in poor =} adai |] onoo1 | andi |) tooo1 ) jmi |} 1001 | om ooo1a | sub |] ororo | or tooo | jez |] 11010 | int ooo11 subi |) ovoid | oni tou | je io. | tet coro | mat |] ori00 | shit |) 1100 | jump || 11100 | store ooo | diy |] onto1 | shite |) 101 ] nop |) ior | toad cog | mov |] oro | not |] soo | cat }] aati | eset oot | movi |] ont | ase |) toi. | set itt | halt Miscellaneous instructions: nop (21), reset (30), halt (31) Now we list these instructions with their op-codes in the binary form, as they would appear in the machine instructions of the FALCON-A. Data access mode for operations As mentioned earlier, the instruction set architecture of a processor defines a number of things besides the instructions implemented; the resources each instruction can access, the number of registers available to the processor, the number of registers each instruction can access, the instructions that are allowed to access memory, any special registers, constants and any alternatives to the general-purpose registers. With this in mind, we go on to the next steps of our ISA design. ISA Design Steps — Step 5 We now need to select the number and types of operands for various instructions that we have selected for the FALCON-A ISA. ALU instructions may have 2 to 3 registers Regoer Encoding as operands. In case of 2 operands, a RO 00 constant (an immediate operand) may be ool included in the instruction. o1o For the load/store type instructions, we DI require a register to hold the data that is to be loaded from the memory, or stored back to the memory. Another register is required to hold the base address for the memory access. In addition to these two registers, a 100 101 87 Advanced Computer Architecture Lecture No. 8 ISA of the FALCON-A Reading Material Handouts Slides Summary ¢ Introduction to the ISA of the FALCON-A e Examples for the FALCON-A Introduction to the ISA of the FALCON-A We take a look at the notation that we are going to employ when studying the FALCON- A. We will refer to the contents of a register by enclosing in square brackets the name of the register, for instance, R [3] refers to the contents of the register 3. Memory contents are to be referred to in a similar fashion; for instance, M [8] refers to the contents of memory at location 8 (or the 8” memory cell). Since memory is organized into cells of 1 byte, whereas the memory word size is 2 bytes, two adjacent memory cells together make up a memory word. So, memory word at the memory address 8 would be defined as | byte at address 8 and 1 byte at address 9. To refer to 16-bit memory Fig. Big- Endian Notation words, we make use of a special notation, the concatenation of two memory locations. Therefore, to refer to the 16-bit memory word at location 8, we would write M[8]OM[9]. As we employ the big-endian format, M [8]<15...0>:=M[8]OM[9] So in our notation © is used to represent concatenation. Little endian puts the smallest numbered byte at the least-significant position in a word, whereas in big endian, we place the largest numbered byte at the most significant position. Note that in our case, we use the big-endian convention of ordering bytes. However, within each byte itself, the ordering of the bits is little endian. FALCON-A Features The FALCON-A processor has fixed-length instructions, each 16 bits (2 bytes) long. Addressing modes supported are limited, and memory is accessed through the load/store One memory “word” MIE] 15 Q MES) _| x MS Byte LS Byte € Memory addresses Advanced Computer Architecture instructions only. FALCON-A Instruction Formats Three categories of instructions are going to be supported by the FALCON-A processor; arithmetic, control, and data transfer instructions. Arithmetic instructions enable mathematical computations. Control instructions help change the flow of the program as and when required. Data transfer operations move data between the processor and memory. The arithmetic category also includes the logical instructions. Four different types of instruction formats are used to specify these instructions. A brief overview of the various fields in these instructions formats follows. Type I instruction format is shown in the given figure. In it, 5 bits are reserved for the op-code (bits 11 Op-code unused through 15). The rest of the bits are unused in this instruction type, which 15 110 8 7 Q means they are not considered. Op-eode ra c2 Type I instruction shown in the given figure, has a 5-bit op-code field, a 3-bit register field, and an 8-bit constant (or immediate operand) field. Type II instructions contain the 5- bit op-code field, two 3-bit register fields for source and destination registers, and an immediate operand 1, “wi 87 564 21 0 field of length 5 bits. Type IV instructions contain the op- code field, two 3-bit register fields, a constant filed on length 3 bits as well as two unused bits. This format is shown in the given figure. Encoding of registers We have a register file comprising of eight general-purpose registers in the CPU. To encode these registers in the binary, so they can be referred to in various instructions, we require log2(8) = 3 bits. Therefore, we have already allocated three bits per Ti register in the instructions, as seen in the various instruction formats. The 15 1110 a 16 10 87 54 o Op-code reo] oe cl Op-code ta th re unused Fig. Register Encodings 91 Advanced Computer Architecture encoding of registers in the binary format is shown in the given table. It is important to note here that the register RO has special usage in some cases. For instance, in load/ store operations, if register RO is used as a second operand, its value is considered to be zero. RO has special usage in the multiply and divide (mul & div) instructions as well. Instructions and instruction formats We return to our discussion of instruction formats in this section. We will now classify which instructions belong to what instruction format types. Typel Five of the instructions included in the instruction set of FALCON-A belong to type I instruction format. These are 1. nop (op-code = 21) This instruction is to instruct the processor to ‘do nothing’, or, in other words, do ‘no operation’. This instruction is generally useful in pipelining. We will study pipelining later in the course. reset (op-code = 30) halt — (op-code=31) int (opcode= 26) . iret (op-code= 27) All of these instructions take no operands, therefore, besides the 5 bits used for the op- code, the rest of the bits are unused. Type I There are nine FALCON-A instructions that belong to this type. These are listed below. 1. movi (op-code =7 ) The movi instruction loads a register with the constant (or the immediate value) specified as the second operand. An example is movi R3, 56 R[3] — 56 This means that the register R3 will have the value 56 stored in it as this instruction is executed. 2. in (op-code = 24) This instruction is to load the specified register from input device. An example and its interpretation in register transfer language are in R3, 57 R [3] — IO [57] 3. out (op-code = 25) The ‘out’ instruction will move data from the register to the output device specified in the instruction, as the example demonstrates: out R7, 34 10 [34] — R[7] 4. ret (op-code=23) This instruction is to return control from a subroutine. This is done using a register, where the return address is stored. As shown in the example, to return control, the program counter is assigned the contents of the register. ret R3 PC —R[3] VwPYn 92 Advanced Computer Architecture 5. shiftl (op-code = 12) This instruction shifts the value stored in the source register (which is the second operand), and shifts the bits left as many times as is specified by the third operand, the constant value. For instance, in the instruction shiftl r4, 13, 7 The contents of the register are shifted left 7 times, and the resulting number is assigned to the register r4. 6. shiftr (op-code = 13) This instruction shifts to the right the value stored in a register. An example is: shiftr 14, 13,9 7. asr (op-code = 15) An arithmetic shift right is an operation that shifts a signed binary number stored in the source register (which is specified by the second operand), to the right, while leaving the sign-bit unchanged. A single shift has the effect of dividing the number by 2. As the number is shifted as many times as is specified in the instruction through the constant value, the binary number of the source register gets divided by the constant value times 2. An example is asrrl,12,5 This instruction, when executed, will divide the value stored in r2 by 10, and assign the result to the register rl. 8. load (op-code= 29) This instruction is to load a register from the memory. For instance, the instruction load rl, [r4 +15] will add the constant 15 to the value stored in the register r4, access the memory location that corresponds to the number thus resulting, and assign the memory contents of this location to the register rl; this is denoted in RTL by: R{1] — M[R[4}+15] 9. store (op-code= 28) This instruction is to store a value in the register to a particular memory location. In the example: store 16, [r7+13] The contents of the register r6 are being stored to the memory location that corresponds to the sum of the constant 13 and the value stored in the register r7. M[R[7}+13] <— R{6] Type II Modified There are 3 instructions in the modified form of the Type III instructions. In the modified Type III instructions, the field cl is unused. 1. mov (op-code = 6 ) This instruction will move (copy) data of a source register to a destination register. For instance, in the following example, the contents of the register r3 are copied to the register r4. mov r4, 13 In RTL, this can be represented as 95 Advanced Computer Architecture R[4] — RDB] 2. not (op-code = 14 ) This instruction inverts the contents of the source register, and assigns the value thus obtained to the destination register. In the following example, the contents of register r2 are inverted and assigned to register r4. not r4, r2 In RTL: R[4] — !R[2] 3. call (op-code = 22 ) Procedure calls are often encountered in programming languages. To add support for procedure (or subroutine) calls, the instruction call is used. This instruction first stores the return address in a register and then assigns the Program Counter a new value (that specifies the address of the subroutine). Following is an example of the call instruction call r4, 13 This instruction saves the current contents (the return address) of the Program Counter into the register r4 and assigns the new value to the PC from register 13. R[4] — PC, PC — R{3] Type IV Six instructions belong to the instruction format Type IV. These are 1. add = (op-code = 0) This instruction adds contents of a register to those of another register, and assigns to the destination register. An example: and r4, 13, r5 R[4] — R[3] +R[5] sub (op-code = 2 ) This instruction subtracts value of a register from another the value stored in another register, and assigns to the destination register. For example, sub 14,13, 15 In RTL, this is denoted by R[4] — R[3] —R[5] mul (op-code = 4) The multiply instruction will store the product of two register values, and stores in the destination register. An example is mul 5, r7, rl The RTL notation for this instruction will be R[0] © R[S] — R[7]*R[1] 4. div (op-code= 5) This instruction will divide the value of the register that is the second operand, by the number in the register specified by the third operand, and assign the result to the destination register. 5. div r4,17,12 R[4]<-R[0] ©R[7]/R[2],R[0]<R[0] OR[7]%R[2] and —_ (op-code= 8) This ‘and’ instruction will obtain a bit-wise ‘and’ of the values of two registers and 96 Advanced Computer Architecture assigns it to a destination register. For instance, in the following example, contents of register r4 and r5 are bit-wise ‘anded’ and the result is assigned to the register rl. and rl, 14, r5 In RTL we may write this as R{1] — R[4] &R[5] 6. or (op-code= 10) To bit-wise ‘or’ the contents of two registers, this instruction is used. For instance, or r6, 17,12 In RTL this is denoted as R[6] — RI7]~RL2] FALCON-A: Instruction Set Summary We have looked at the various types of instruction formats for the FALCON-A, as well as the instructions that belong to each of these instruction format types. In this section, we have simply listed the instructions on the basis of their functional groups; this means that Data Transfer Mnemonic opcode Insiruciions wove mov 00110 &) Move immediate movi oolll (7 Input to register in 11000 (24) Output from register out 11001 (25) Load from memory load LL1O1 2%) Store into memozy store 11100 (28) Fig. Data Transfer Instructions 97
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved