Chapter 11   Coprocessing and Multiprocessing

Chapter 11 Coprocessing and Multiprocessing



Chapter 11  Coprocessing and Multiprocessing

----------------------------------------------------------------------------

The 80386 has two levels of support for multiple parallel processing units:

  *  A highly specialized interface for very closely coupled processors of
     a type known as coprocessors.

  *  A more general interface for more loosely coupled processors of
     unspecified type.


11.1 Coprocessing

11.1 Coprocessing The components of the coprocessor interface include: * ET bit of control register zero (CR0) * The EM, and MP bits of CR0 * The ESC instructions * The WAIT instruction * The TS bit of CR0 * Exceptions

11.1.1 Coprocessor Identification

11.1.1 Coprocessor Identification The 80386 is designed to operate with either an 80287 or 80387 math coprocessor. The ET bit of CR0 indicates which type of coprocessor is present. ET is set automatically by the 80386 after RESET according to the level detected on the ERROR# input. If desired, ET may also be set or reset by loading CR0 with a MOV instruction. If ET is set, the 80386 uses the 32-bit protocol of the 80387; if reset, the 80386 uses the 16-bit protocol of the 80287.

11.1.2 ESC and WAIT Instructions

11.1.2 ESC and WAIT Instructions The 80386 interprets the pattern 11011B in the first five bits of an instruction as an opcode intended for a coprocessor. Instructions thus marked are called ESCAPE or ESC instructions. The CPU performs the following functions upon encountering an ESC instruction before sending the instruction to the coprocessor: * Tests the emulation mode (EM) flag to determine whether coprocessor functions are being emulated by software. * Tests the TS flag to determine whether there has been a context change since the last ESC instruction. * For some ESC instructions, tests the ERROR# pin to determine whether the coprocessor detected an error in the previous ESC instruction. The WAIT instruction is not an ESC instruction, but WAIT causes the CPU to perform some of the same tests that it performs upon encountering an ESC instruction. The processor performs the following actions for a WAIT instruction: * Waits until the coprocessor no longer asserts the BUSY# pin. * Tests the ERROR# pin (after BUSY# goes inactive). If ERROR# is active, the 80386 signals exception 16, which indicates that the coprocessor encountered an error in the previous ESC instruction. * WAIT can therefore be used to cause exception 16 if an error is pending from a previous ESC instruction. Note that, if no coprocessor is present, the ERROR# and BUSY# pins should be tied inactive to prevent WAIT from waiting forever or causing spurious exceptions.

11.1.3 EM and MP Flags

11.1.3 EM and MP Flags The EM and MP flags of CR0 control how the processor reacts to coprocessor instructions. The EM bit indicates whether coprocessor functions are to be emulated. If the processor finds EM set when executing an ESC instruction, it signals exception 7, giving the exception handler an opportunity to emulate the ESC instruction. The MP (monitor coprocessor) bit indicates whether a coprocessor is actually attached. The MP flag controls the function of the WAIT instruction. If, when executing a WAIT instruction, the CPU finds MP set, then it tests the TS flag; it does not otherwise test TS during a WAIT instruction. If it finds TS set under these conditions, the CPU signals exception 7. The EM and MP flags can be changed with the aid of a MOV instruction using CR0 as the destination operand and read with the aid of a MOV instruction with CR0 as the source operand. These forms of the MOV instruction can be executed only at privilege level zero.

11.1.4 The Task-Switched Flag

11.1.4 The Task-Switched Flag The TS bit of CR0 helps to determine when the context of the coprocessor does not match that of the task being executed by the 80386 CPU. The 80386 sets TS each time it performs a task switch (whether triggered by software or by hardware interrupt). If, when interpreting one of the ESC instructions, the CPU finds TS already set, it causes exception 7. The WAIT instruction also causes exception 7 if both TS and MP are set. Operating systems can use this exception to switch the context of the coprocessor to correspond to the current task. Refer to the 80386 System Software Writer's Guide for an example. The CLTS instruction (legal only at privilege level zero) resets the TS flag.

11.1.5 Coprocessor Exceptions

11.1.5 Coprocessor Exceptions Three exceptions aid in interfacing to a coprocessor: interrupt 7 (coprocessor not available), interrupt 9 (coprocessor segment overrun), and interrupt 16 (coprocessor error).

11.1.5.1 Interrupt 7 -- Coprocessor Not Available

11.1.5.1 Interrupt 7 -- Coprocessor Not Available This exception occurs in either of two conditions: 1. The CPU encounters an ESC instruction and EM is set. In this case, the exception handler should emulate the instruction that caused the exception. TS may also be set. 2. The CPU encounters either the WAIT instruction or an ESC instruction when both MP and TS are set. In this case, the exception handler should update the state of the coprocessor, if necessary.

11.1.5.2 Interrupt 9 -- Coprocessor Segment Overrun

11.1.5.2 Interrupt 9 -- Coprocessor Segment Overrun This exception occurs in protected mode under the following conditions: * An operand of a coprocessor instruction wraps around an addressing limit (0FFFFH for small segments, 0FFFFFFFFH for big segments, zero for expand-down segments). An operand may wrap around an addressing limit when the segment limit is near an addressing limit and the operand is near the largest valid address in the segment. Because of the wrap-around, the beginning and ending addresses of such an operand will be near opposite ends of the segment. * Both the first byte and the last byte of the operand (considering wrap-around) are at addresses located in the segment and in present and accessible pages. * The operand spans inaccessible addresses. There are two ways that such an operand may also span inaccessible addresses: 1. The segment limit is not equal to the addressing limit (e.g., addressing limit is FFFFH and segment limit is FFFDH); therefore, the operand will span addresses that are not within the segment (e.g., an 8-byte operand that starts at valid offset FFFC will span addresses FFFC-FFFF and 0000-0003; however, addresses FFFE and FFFF are not valid, because they exceed the limit); 2. The operand begins and ends in present and accessible pages but intermediate bytes of the operand fall either in a not-present page or in a page to which the current procedure does not have access rights. The address of the failing numerics instruction and data operand may be lost; an FSTENV does not return reliable addresses. As with the 80286/80287, the segment overrun exception should be handled by executing an FNINIT instruction (i.e., an FINIT without a preceding WAIT). The return address on the stack does not necessarily point to the failing instruction nor to the following instruction. The failing numerics instruction is not restartable. Case 2 can be avoided by either aligning all segments on page boundaries or by not starting them within 108 bytes of the start or end of a page. (The maximum size of a coprocessor operand is 108 bytes.) Case 1 can be avoided by making sure that the gap between the last valid offset and the first valid offset of a segment is either no less than 108 bytes or is zero (i.e., the segment is of full size). If neither software system design constraint is acceptable, the exception handler should execute FNINIT and should probably terminate the task.

11.1.5.3 Interrupt 16 -- Coprocessor Error

11.1.5.3 Interrupt 16 -- Coprocessor Error The numerics coprocessors can detect six different exception conditions during instruction execution. If the detected exception is not masked by a bit in the control word, the coprocessor communicates the fact that an error occurred to the CPU by a signal at the ERROR# pin. The CPU causes interrupt 16 the next time it checks the ERROR# pin, which is only at the beginning of a subsequent WAIT or certain ESC instructions. If the exception is masked, the numerics coprocessor handles the exception according to on-board logic; it does not assert the ERROR# pin in this case.

11.2 General Multiprocessing

11.2 General Multiprocessing The components of the general multiprocessing interface include: * The LOCK# signal * The LOCK instruction prefix, which gives programmed control of the LOCK# signal. * Automatic assertion of the LOCK# signal with implicit memory updates by the processor

11.2.1 LOCK and the LOCK# Signal

11.2.1 LOCK and the LOCK# Signal The LOCK instruction prefix and its corresponding output signal LOCK# can be used to prevent other bus masters from interrupting a data movement operation. LOCK may only be used with the following 80386 instructions when they modify memory. An undefined-opcode exception results from using LOCK before any instruction other than: * Bit test and change: BTS, BTR, BTC. * Exchange: XCHG. * Two-operand arithmetic and logical: ADD, ADC, SUB, SBB, AND, OR, XOR. * One-operand arithmetic and logical: INC, DEC, NOT, and NEG. A locked instruction is only guaranteed to lock the area of memory defined by the destination operand, but it may lock a larger memory area. For example, typical 8086 and 80286 configurations lock the entire physical memory space. The area of memory defined by the destination operand is guaranteed to be locked against access by a processor executing a locked instruction on exactly the same memory area, i.e., an operand with identical starting address and identical length. The integrity of the lock is not affected by the alignment of the memory field. The LOCK signal is asserted for as many bus cycles as necessary to update the entire operand.

11.2.2 Automatic Locking

11.2.2 Automatic Locking In several instances, the processor itself initiates activity on the data bus. To help ensure that such activities function correctly in multiprocessor configurations, the processor automatically asserts the LOCK# signal. These instances include: * Acknowledging interrupts. After an interrupt request, the interrupt controller uses the data bus to send the interrupt ID of the interrupt source to the CPU. The CPU asserts LOCK# to ensure that no other data appears on the data bus during this time. * Setting busy bit of TSS descriptor. The processor tests and sets the busy-bit in the type field of the TSS descriptor when switching to a task. To ensure that two different processors cannot simultaneously switch to the same task, the processor asserts LOCK# while testing and setting this bit. * Loading of descriptors. While copying the contents of a descriptor from a descriptor table into a segment register, the processor asserts LOCK# so that the descriptor cannot be modified by another processor while it is being loaded. For this action to be effective, operating-system procedures that update descriptors should adhere to the following steps: -- Use a locked update to the access-rights byte to mark the descriptor not-present. -- Update the fields of the descriptor. (This may require several memory accesses; therefore, LOCK cannot be used.) -- Use a locked update to the access-rights byte to mark the descriptor present again. * Updating page-table A and D bits. The processor exerts LOCK# while updating the A (accessed) and D (dirty) bits of page-table entries. Also the processor bypasses the page-table cache and directly updates these bits in memory. * Executing XCHG instruction. The 80386 always asserts LOCK during an XCHG instruction that references memory (even if the LOCK prefix is not used).

11.2.3 Cache Considerations

11.2.3 Cache Considerations Systems programmers must take care when updating shared data that may also be stored in on-chip registers and caches. With the 80386, such shared data includes: * Descriptors, which may be held in segment registers. A change to a descriptor that is shared among processors should be broadcast to all processors. Segment registers are effectively "descriptor caches". A change to a descriptor will not be utilized by another processor if that processor already has a copy of the old version of the descriptor in a segment register. * Page tables, which may be held in the page-table cache. A change to a page table that is shared among processors should be broadcast to all processors, so that others can flush their page-table caches and reload them with up-to-date page tables from memory. Systems designers can employ an interprocessor interrupt to handle the above cases. When one processor changes data that may be cached by other processors, it can send an interrupt signal to all other processors that may be affected by the change. If the interrupt is serviced by an interrupt task, the task switch automatically flushes the segment registers. The task switch also flushes the page-table cache if the PDBR (the contents of CR3) of the interrupt task is different from the PDBR of every other task. In multiprocessor systems that need a cacheability signal from the CPU, it is recommended that physical address pin A31 be used to indicate cacheability. Such a system can then possess up to 2 Gbytes of physical memory. The virtual address range available to the programmer is not affected by this convention.