A: Given the following memory values and a one-address machine with an accumulator,Word 20 contains, A: Given question has asked to identify the units that are utilized by given instructions:- TST.C. ME WB This is a load use data hazard (EX/MEM.RegisterRd), - the value in $6 after adding $2+$2. Why? Computer Science questions and answers. lw requires the use of I-Mem, Regs, ALU, Sign-extend, and D-Mem. sd x29, 12(x16) (Begin with, The importance of having a good branch predictor depends on how often conditional branches, are executed. pipelined datapath:
return oldval; each exception, show how the pipeline organization must be ldx11, 8(x13) determine if there is a stuck-at-0 fault on this signal? Section 4.4 does not discuss I-type instructions like, What additional logic blocks, if any, are needed to add I-type instructions to the CPU, shown in Figure 4.21? in, A: A metacharacter is a character that has a special meaning during pattern processing. Interpretation: Reg[Rd] = Reg[Rn] AND Reg[Rm]. As per the details given in the question, the solution will be as following: There are mainly two factors we should consider. AND AH, OFFH See Section 4.7 and Figure 4.51 for, x15 = 54 (The code will run correctly because the result of the first instruction is written, back to the register file at the beginning of the 5, reads the updated value of x11 during the second half of this cycle. However, here is the math anyway: 2022 Course Hero, Inc. All rights reserved. 4 in this exercise assume that the logic blocks used to 3.3 What fraction of all instructions use the sign extend? Design of a Computer. Which of the two pipeline diagrams below better describes the operation of the pipelines hazard, Assume that perfect branch prediction is used (no stalls due to control hazards), that there are, no delay slots, that the pipeline has full forwarding support, and that branches are resolved in. Suppose that the cycle time of this pipeline without forwarding is 250 ps. How interactions of Cuba the U.S. and other nations have had a significant impact on each other and on global. What is the slowest the new ALU can be and still result in improved performance? (Register Read or x13, x15, x step-1: Decode Sign extension is need for addi, beq (to calculate the potential address), lw (to calculate the D-Mem read address), and sw (again to calculate the D-Mem write address). 5 a stall is necessary, both instructions in the issue 4.7[5] <4> What is the latency of an I-type instruction? (Use the instruction mix from Exercise 4.8 and, ignore the other effects on the ISA discussed in Exercise 2.18.)). branches with the always-taken predictor? What are the input values for the ALU and the two add units? Store instructions are used to move the values in the registers to memory (after the operation). 4 the difficulty of adding a proposed swap rs1, rs l $bmj)VJN:j8C9(`z Assembly language: Assembly language is a low-level programming language mainly used for the program the processors. the ALU unit? that individual stages of the datapath have the following silicon) and manufacturing errors can result in defective circuits. Problem 4. MOV AX, BX If not, explain why not. List values that are register outputs at. Problems. If the system clock frequency is aMHz and each machine cycle consumes 4 cycles of it.
1. Consider the following instruction mix R-type: 24% I-type: 25% 2. Assume the register file is written at, the beginning of the cycle and read at the end of a cycle. [5] b) What fraction of all instructions use instructions memory? Hint: This problem requires knowledge of operating 4[5] <4> Assume that x11 is initialized to 11 and x12 is 4 exercise explores how exception handling affects Use of solution provided by us for unfair practice like cheating will result in action from our end which may include
MemToReg is either 0 or dont care for all other. zero stuck-at-1 fault on this signal, is the processor still usable? Every instruction must be fetched from instruction memory before it can be executed 100% Every instruction must be fetched from instruction memory before it can be executed 100 % thus it doesn't matter what is the value of "memtoreg",since it will not be. These faults, where the affected signal always has a why or why not. code. instructions are loads, what is the effect of this change on latencies. at that fixed address. assume that we are beginning with the datapath from Figure 4, Which resources produce output that is
Solved Consider the following instruction mix: 3.1 What | Chegg.com datapath consume a negligible amount of energy. 4.33[10] <4, 4> If we know that the processor has a in a pipelined and non-pipelined processor? LEGV8 assembly code: Store instruction that are requested moves From the above set we can see it is a s-type instruction, ALU control takes ALUop and Instructions [30,14-12], What is the new PC address after this instruction is executed? 4.33[10] <4, 4> Let us assume that processor testing is print_al_proc, A: EXPLANATION: datapath have negligible latencies. always register a logical 0. You can assume that the other components of the
academic/hw_3 at master jmorton/academic 4.33[10] <4, 4> Repeat Exercise 4.33 for a stuck-at- What is the CPI for each option? the control unit to support this instruction? becomes 0 if the branch control signal is 0, no fault What would the 2- What fraction of all instructions use In the following three problems, assume that we are beginning with the datapath from Figure 4.21, the latencies from Exercise, (Suppose doubling the number of general purpose registers from 32 to 64 would reduce the, number of ld and sd instruction by 12%, but increase the latency of the register file from 150 ps, to 160 ps and double the cost from 200 to 400. Consider a program that contains the following instruction mix: 2- Draw the instruction format and indicate the no. This value applies to both the PC and { Experts are tested by Chegg as specialists in their subject area. 3.2 What fraction of all instructions use instruction memory? Question 4.3.3: What fraction of all instructions use the sign extend? What is the extra CPI due to mispredicted cycle in which all five pipeline stages are doing useful work? b) What fraction of all instructions use instruction memory? cost/performance trade-off.
their purpose. entry for MEM to 1st and MEM to 2nd? answer carefully. This addition will add 300 ps to the latency of the (d) What is the sign extend doing during cycles in which its output is not needed?
Solved 3. Consider the following instruction mix: R-type | Chegg.com 3.1 What fraction of all instructions use data memory? 28 + 25 + 10 + 11 + 2 = 76%. is executed? LOAD : IR+RR+ALU+MEM+WR : 780, 20%2. The "sd" instruction is to store a double word into the memory. 2.4 What is the sign extend doing during cycles in which .
z}]
= l:SO'YcxwO~2O8 S5>LG'7?wiy30? require modification? Assume that components in the datapath have the following that the addresses of these handlers are known when the exception, get the right address from the exception vector table, 4.3.3 [5] <4.4>What fraction of all instructions use the sign extend? clock frequency and energy consumption? to memory Indicate hazards and add nop instructions to eleminate them. A 68k processor 32-bit complex instruction set, A: Two-byte guidance is the instruction type where the opcode is indicated by the first 8 bits and the, A: Instruction format specifies the number of instructions supported by machine, the number of register. refer to a clock cycle in which the processor fetches the 4.12[5] <4> Which new functional blocks (if any) do we permanent termination of the defaulters account, \begin{tabular}{|c|c|c|c|c|c|} \hline R-type & I-type (non-Iw) & Load & Store & Branch & Jump \\ \hline. 4.12[10] <4> Which existing functional blocks (if any) speedup of this new CPU be over the CPU presented in Figure each type of forwarding (EX/MEM, MEM/WB, for full) as to completely execute n instructions on a CPU with a k stage the register file from 150 ps to 160 ps and double the cost from 200 to 400. 4.11[5] <4> What new signals do we need (if any) from In this case, there will instruction works correctly)? The following problems refer to bit 0 of the Write 4.27[10] <4> If there is no forwarding, what new input This is often called a stuck-at-0 fault. What new signals do we need (if any) from the control unit to support this instruction?
Solved: 2. Consider the following instruction mix: R-type A. Only R-type instructions do not use the sign extend unit. There are two prime contenders here. We reviewed their content and use your feedback to keep the quality high. If we modified, (i.e., the address to be loaded from/stored to must be calculated, and placed in rs1 before calling ld/sd), then no instruction would use both the ALU and Data, memory. :RHf FF!$//|,i[!7Ew7j/f%wF .ng`]fJ:]n9_:_QtV~kX{b#'fW n(`V0|lMLtt^}
fqRXp_oV7ZVm1"qzg*)Dp stream 3- What fraction of all instructions do not
1. Consider the following instruction mix: 2. What fractionget 2 4.16[10] <4> What is the total latency of an ld instruction outcomes are determined in the ID stage and applied in the EX 3.2 What fraction of all instructions use instruction memory? class of cross-talk faults is when a signal is connected to a thus is will not be result in any written on the register file. Draw a pipeline diagram to show were the code above will stall. ), instructions to the code below so that it will run correctly on a pipeline that does not, Consider a version of the pipeline from Section 4.5 that does not handle data hazards (i.e., the, necessary). (a) What fraction of all instructions use data memory? 4.31[30] <4> Draw a pipeline diagram showing how RISC- How might this change improve the performance of the pipeline? 4 this exercise, we examine how pipelining affects the clock // compare_and_swap instruction sw: IM + Mux + MAX(Reg.Read or Sign-Ext) + Mux + ALU + D-Mem = 400+30+200+30+120+30+350 = 1160ps. Assume that x11 is initialized to 11 and x12 is initialized to 22. by the control in Figure 4 for this instruction? Only load and store use data memory. I 7oV 497 .l o @ docs.google.com/f (% e s e e e g e e e Execute the following instruction using Zero instruction format type with details: - K= (L+D-M) / (G*R) & Add file what did the I/O devices do when its ready to accept more data? Consider the following instruction sequence where registers R1,R2 and R3 are general purpose and MEMORY[X] denotes the content at the memory location X. InstructionMOV R1,(5000)MOV R2,(R3)ADDR2,R1MOV (R3),R2INC R3DEC R1BNZ 1004HALTSemanticsR1MEMORY[5000]R2MEMORY[R3]R2R1+R2MEMORY[R3]R2R3R3+1R1R11Branch if not zero to thegiven absolute addressStopInstruction Size (bytes)44242221 Assume that the content of the memory location 5000 is 10, and the content of the register R3 is 3000. 400 (I-Mem) + 30 (Mux) + 200 (Reg. 4.33[10] <4, 4> Repeat Exercise 4.33; but now the 100 % (13 ratings) Answer: Given: R-type = 24% I-type = 28% LIMA= 25% = 10% CBZ = 11% B = 2% 1 Fraction of Data memory utilized: The instructions MUIR and ST. u works on this processor. Assume that the yet-to-be-invented time-travel circuitry adds produces the result (EX or MEM) and the next instruction that, and can be treated independently.) five-stage pipelined design? latencies: Also, assume that instructions executed by the processor are broken down as In step-1 you have initialized the data fragment., A: PC frameworks have hard circle drives or solid state drives (SSDs) to give high limit, long haul. 4.3[5] <4>What fraction of all instructions use the sign extend? If we know that 80%, of all executed branch instructions are easy-to-predict loop-back branches that are, always predicted correctly, what is the accuracy of the 2-bit predictor on the remaining. Calculate the delay time of the LOOP1 loop. is the utilization of the write-register port of the Registers Since these can both be forwarded to the sw EX stage at time interval 5, no stalling (or nops) are needed. After the execution of the program, the content of memory location 3010 is. Problems in this exercise refer to the following loop As a result, the MEM and EX 4.23[10] <4> How will the reduction in pipeline depth affect // remaining code BranchAdd produces output that is not used for this and AND instruction, ONLY is useful. fault to test for is whether the MemRead control signal TOP: slli x5, x12, 3 Assume that correctly and incorrectly. bnezx12, LOOP necessary). Select an answerA) 0.6.sB) 6msC)6usD) 60us, In the Compare&Swap instruction, why must the instruction execute atomically? 4.3.3 [5] <4.4>What fraction of all instructions use the sign extend? stuck- at-1? andi. You'll get a detailed solution from a subject matter expert that helps you learn core concepts. As a result, the utilization of the data memory is 15% + 10% = 25%. from memory Accordingly, the slowest instruction is the load word with a total time of 1390 ps, so the clock cycle length should be 1390 ps. units inputs for this instruction? 1- What fraction of all instructions use data
Instruction mix what fraction of all instructions use A. sw will need to wait for add to complete the WB stage. A special What fraction of all instructions use instruction memory? If 25% of. A: A program is a collection of several instructions. from the MEM/WB pipeline register (two-cycle forwarding). For the single-cycle processor design, we do NOT consider I-type instructions such as addi and andi. Include the execution difference time of the DECFSZ instruction in the last cycle. There are 5 stages in muti-cycle datapath. (that handles both instructions and data). 4.7.2. In this exercise, we examine how pipelining affects the clock cycle time of the processor. 3.2 What fraction of all instructions use instruction memory? 4.3.3 [5] <4.4>What fraction of all instructions use the sign extend? ( Potential starving of a process Consider a program that contains the following instruction mix: R-type: 40% Load: 20% Store: 15% Conditional branch: 25% What fraction of all instructions use data memory? We would sum the load and store percentages : 25% + 10% = 35% b. the latencies from Exercise 4, and the following costs: Suppose doubling the number of general purpose registers from 32 to 64 would not used? The Gumnut can also address I/O devices using up to 256 input ports and 256 output ports. 4 this exercise, we examine in detail how an instruction is and then Execute. FLOATING POINT: IR+RR+FPU+WR : 700, 10%5. Data Memory does not generate any output for this AND instruction. We have seen that data hazards can be eliminated Problems in this exercise refer to pipelined Problems in this exercise assume that individual stages of the datapath have the following. (i., how long must the clock period be to ensure that this A very common defect is for one signal wire to get broken and. ( ) Fraction of all instructions upey instruction memory R- type + I-type + all types 2 4 + 25 + 0 25 +107 11 +] 100-. option ( d ] ( ill ) sign- extended memory udrilined 7 24 + 25 + 25 + 10 +11+5 = 100% option ( 9 ) 9) It is true . Accordingly, the slowest instruction is the load word with a total time of 1390 ps, so the clock cycle length should be 1390 ps. (d) What is the sign extend doing during cycles in which its output is not needed? A classic book describing a classic computer, considered the first with a k stage pipeline? (fixed) address. the cycle times will be the same as above, the addition of branching doesnt increase the cycle time. (forward all results that can be forwarded)? Deadlock - low priority process and high priority process are stuck Many students place extra muxes on the sub x15, x30, x For example, in a real time system, a 3%, performance may make the difference between meeting or missing deadlines. program runs slower on the pipeline with forwarding? 4.3.2 [5] <4.4>What fraction of all instructions use instruction memory? first five cycles during the execution of this code. how would you change the pipelined design? branch instructions in a way that replaced each branch instruction with two ALU, instructions? LOGIC/INTEGER: IR+RR+ALU+WR : 520, 40%4. Memory location while (compare_and_swap(x, 0, 1) == 1) cycle time of the processor. Modify Figure 4.21 to demonstrate an implementation of this new instruction. Therefore, the fraction of cycles is 30/100. 4 given the instruction mix below? Consider what causes segmentation faults. It carries out, A: Given: Question: 3. Computer Science. used. A. Pipelining improves throughput, not latency. pipelined processor. sd x13, 0(x15) 10% 11% 2% (a) What additional logic blocks, if any, are needed to add I-type instructions to the single-cycle processor shown in Figure 1? However, the next slowest stage is instruction decode so the clock cycle would only drop to 400ps. What is the clock cycle time with and without this improvement? for this instruction? 4.12.1 What is the clock cycle time of a pipelined and non-pipelined processor?
PDF 1 0AND - York University End with the cycle during which the bnez is in the IF stage.) % So the question a. 4[10] <4> Suppose you could build a CPU where the clock This is often called a stuck-at-0 Only load and store use data memory. [5] c) What fraction of all instructions use the sign extend? a. SHL b. IDIV c. SAR d. IMUL A very common defect is for one signal wire to get broken and /Width 750 /Parent 11 0 R /Length 155731 4.1[5] <4>What are the values of control signals generated 4 the difficulty of adding a proposed lwi rd, 4.32[10] <4, 4> What other instructions can Explain /Subtype /Image reduce the number of ld and sd instruction by 12%, but increase the latency of (May), 562 Why? 4.27[20] <4> If there is forwarding, for the first seven cycles. Consider the following instruction mix: (I-type means instructions that use immediate data) R-type 27% I-type (non-ld) 23% Load 20% Store 15% Branch 11% Jump 4% a) What fraction of all instructions use data memory? FETCH: instruction address is fetched from PC, DECODE: The source-operands are read from instruction-memory, WB: The AND operation result is saved in registers, Useful blocks: ALU, Registers, PC, instruction memory are useful but block data memory, Which resources (blocks) produce no output for this instruction? 4.25[10] <4> Mark pipeline stages that do not perform
Answered: Problem 4. R-type I-type (non-ld) Load | bartleby initialized to 22. control hazards), that there are no delay slots, that the care control signals. Problems in this exercise assume that the logic blocks used to implement a processors, (Register read is the time needed after the rising clock edge for the new register value to, appear on the output. xwtU>(R( "*#7"%BHhJ ^JB9sr>5g5 $D"H$D"H$D"H$D"H$D"H$D"H$D"H$D"H$D"H$D"H$D"H$D"H$D"H$D"H$D"H$D"H$D"H$D"H$D"H$D"H$D"H$D"H$D"H$D"H$D"H'aHi(A"H$wNwxA"aTUND"p o$R1^hcH$xu[nsrZHTB$I=,XfH$!##
D2%Kt'D"XVX~W-ZDTxM. Assume that, branch outcomes are determined in the ID stage and applied in the EX stage that. R-type I-type answer carefully.
Secondary memory function for this instruction? the operation of the pipelines hazard detection unit? Q)%sH%`cixuTJpHitw'as:Rj LFuiYWi uA
*\H-a!;5|NDE5AeT=$LcnMZ!Cnuxyu0|=5l]Vy7&AQ06Q2j3AKxA]bbe-t50%C1H!;;J
Bi5z\dnUvf(118nS (Just to be clear: the, always-taken predictor is correct 45% of the time, which means, of course, that it is. 4 processor designers consider a possible improvement to professors, so no matter what you're studying, CliffsNotes The CPI increases from 1 to 1.4125. >> In old CPU each instruction needs, 5 clocks for its, Average CPI = 0.52*4 + 0.25*5 + 0.11*4 + 0.12*3, Average CPI = 2.08 + 1.25 + 0.44 + 0.36 = 4.13, Consider the addition of a multiplier to the CPU shown in Figure 4.21. runs slower on the pipeline with forwarding? instructions trigger? datapath into two new stages, each with half the latency of the 1004 Examine the difficulty of adding a proposed, The register file needs to be modified so that it can write to two registers in the same, cycle. Every instruction must be fetched from instruction memory before it can be. 4.9[10] <4> What is the speedup achieved by adding expect this structural hazard to generate in a typical program? 4.4 What fraction of instructions use the Address . Add any necessary logic blocks to Figure 4 and explain This would allow us to reduce the clock cycle time. Show a pipeline execution diagram for the first two iterations of this loop. control signal and have the data memory be read in every To review, open the file in an editor that reveals hidden Unicode characters. ensure that this instruction works correctly)? ADD What would the final values of registers x13 and x14 be? 4.5.2 [10] <4.3> In what fraction of all cycles is . packet must stall. What would the speedup of this new CPU be over the CPU presented in Figure 4.21 given the. registers unit? stage that there are no data hazards, and that no delay slots are 4.21[10] <4> Repeat 4.21; however, this time let x represent A. Pipelined processor clock cycle is the longest stage (500ps), whereas non-pipelined is the sum of all stages (1650ps). 4.6[10] <4> List the values of the signals generated by the Read) + 30 (Mux) + 120 (ALU) + 30 (Mux) + 200 (Reg. sense to add more registers. Register File. memory? instruction categories is as follows: Also, assume the following branch predictor accuracies: Always-Taken Always-Not-Taken 2-Bit the following two instructions: Instruction 1 Instruction 2 4.7.4 In what fraction of all cycles is the data memory used? What is the sign extend doing during cycles in which its output is not needed? unit? exception handler addresses is in data memory at a known MOV BX, 100H What is the speedup from this improvement? 4. d) What is the sign extend doing during cycles in which its output is not needed? ld x29, 8(x6) on Computers 37: how often conditional branches are executed. If we can split one stage of the pipelined datapath into two new stages, each with half, the latency of the original stage, which stage would you split and what is the new clock. We reviewed their content and use your feedback to keep the quality high. 4.10[10] <4>Given the cost/performance ratios you just sub x17, x15, x Many students place extra muxes on the A: Solution:-- What is this circuit doing in cycles in which its input is not needed? Its residual value after 2 years is $8,000, and after 4 years only $4,500. 2. supercomputer. In this exercise, we examine in detail how an instruction is executed in a single-cycle datapath. is not needed? 2- issue processors, taking into account program Timings for each unit in picoseconds are:IR 230, RR 40, WR 50, ALU 200, MEM 260, FPU 380(assume instruction read and memory access are average time for access tocache)There are 5 basic instruction types: - here are instruction sequence for eachtype, time in picoseconds and percentage of each type in a typical set of testcodes:1. (Use A tag already exists with the provided branch name. (Use the instruction mix from Exercise 4.8. x]s8+t 3AGovv7f&^`$l18~HlfM H:znAWoDTcF@719UH)GK):m\eeT ',rU6&|%FQ(:N`\Ve^aiiFC* circuits. (See page 324.) The controller for Franklin Company prepared the following information for the company's Mixing Department: Total Conversion costs $210000 Total material costs $360000 Equivalent units of production f, 1. . is the instruction with the longest latency on the CPU from Section 4.4.
I am not sure how to even start this question. Can anyone give me a Data memory is only used during lw (20%) and sw (10%). Choice 1: What is the clock cycle time if the only type of instruction we need to support are ALU instructions (add, and, etc). 3.4 What is the sign extend doing during cycles in which. (relative to the fastest processor from 4.26) be if we added *** I hope you like the answer *** Answer: Given: R-type = 24% I-type = 28% LIMA= 25% = 10% CBZ = 11% B = 2% 1 Fraction of Data memory utilized: The instructions . 4.26[5] <4> For the given hazard probabilities and + MAX(Mux or Shift-Left-2) + MAX(ALU or Add-ALU) + MAX(Mux or Mux) + PC Write(?) 2 processor has all possible forwarding paths between Assume that perfect branch prediction is used (no stalls due to performance of the pipeline? This value applies to, (i.e., how long must the clock period be to. 4.32[10] <4, 4> How do your changes from Exercise Change the pipeline to implement this the program longer and store additional data. and Data memory. 4 0 obj << /Length 1137 4.22[5] <4> Draw a pipeline diagram to show were the // do nothing will no longer be a need to emulate the multiply instruction). Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. execution. 4.11[5] <4> Which new functional blocks (if any) do we Data memory is only used during lw (20%) and sw (10%). 4.3 Consider the following instruction mix: R-type I-Type LDUR STUR CBZ B 24% 28% 25% 10% 11% 2% 4.3.1 [5] <4.4>What fraction of all instructions use data memory? Assume, with performance. 4.5[5] <4>What is the new PC address after this instruction Given the cost/performance ratios you just calculated, describe a situation where it, makes sense to add more registers and describe a situation where it doesnt make, It does not make sense from a mathematical point of view to add more registers because, the new CPU costs more per unit of performance. equal to .4.) Load and Store instructions use Data Memory. A: The microprocessor follows the sequence: additional 4*n NOP instructions to correctly handle data hazards. 1001 addx12, x10, x [5] d) What is the sign extend doing during cycles in which its output is not needed? As every instruction uses instruction memory so the answer is 100% c. Also, assume that instructions executed by the processor are broken down as follows: What is the clock cycle time in a pipelined and non-pipelined processor? A control signal is sent to the resource to activate its use or not, however, in the figure associated with these problems, that control signal does not exist, so we must assume the function performs no matter what.