Chapter 11 Coprocessing and Multiprocessing
Chapter 11 Coprocessing and Multiprocessing
Chapter 11 Coprocessing and Multiprocessing
----------------------------------------------------------------------------
The 80386 has two levels of support for multiple parallel processing units:
* A highly specialized interface for very closely coupled processors of
a type known as coprocessors.
* A more general interface for more loosely coupled processors of
unspecified type.
11.1 Coprocessing
11.1 Coprocessing
The components of the coprocessor interface include:
* ET bit of control register zero (CR0)
* The EM, and MP bits of CR0
* The ESC instructions
* The WAIT instruction
* The TS bit of CR0
* Exceptions
11.1.1 Coprocessor Identification
11.1.1 Coprocessor Identification
The 80386 is designed to operate with either an 80287 or 80387 math
coprocessor. The ET bit of CR0 indicates which type of coprocessor is
present. ET is set automatically by the 80386 after RESET according to the
level detected on the ERROR# input. If desired, ET may also be set or reset
by loading CR0 with a MOV instruction. If ET is set, the 80386 uses the
32-bit protocol of the 80387; if reset, the 80386 uses the 16-bit protocol
of the 80287.
11.1.2 ESC and WAIT Instructions
11.1.2 ESC and WAIT Instructions
The 80386 interprets the pattern 11011B in the first five bits of an
instruction as an opcode intended for a coprocessor. Instructions thus
marked are called ESCAPE or ESC instructions. The CPU performs the following
functions upon encountering an ESC instruction before sending the
instruction to the coprocessor:
* Tests the emulation mode (EM) flag to determine whether coprocessor
functions are being emulated by software.
* Tests the TS flag to determine whether there has been a context change
since the last ESC instruction.
* For some ESC instructions, tests the ERROR# pin to determine whether
the coprocessor detected an error in the previous ESC instruction.
The WAIT instruction is not an ESC instruction, but WAIT causes the CPU to
perform some of the same tests that it performs upon encountering an ESC
instruction. The processor performs the following actions for a WAIT
instruction:
* Waits until the coprocessor no longer asserts the BUSY# pin.
* Tests the ERROR# pin (after BUSY# goes inactive). If ERROR# is active,
the 80386 signals exception 16, which indicates that the coprocessor
encountered an error in the previous ESC instruction.
* WAIT can therefore be used to cause exception 16 if an error is
pending from a previous ESC instruction. Note that, if no coprocessor
is present, the ERROR# and BUSY# pins should be tied inactive to
prevent WAIT from waiting forever or causing spurious exceptions.
11.1.3 EM and MP Flags
11.1.3 EM and MP Flags
The EM and MP flags of CR0 control how the processor reacts to coprocessor
instructions.
The EM bit indicates whether coprocessor functions are to be emulated. If
the processor finds EM set when executing an ESC instruction, it signals
exception 7, giving the exception handler an opportunity to emulate the ESC
instruction.
The MP (monitor coprocessor) bit indicates whether a coprocessor is
actually attached. The MP flag controls the function of the WAIT
instruction. If, when executing a WAIT instruction, the CPU finds MP set,
then it tests the TS flag; it does not otherwise test TS during a WAIT
instruction. If it finds TS set under these conditions, the CPU signals
exception 7.
The EM and MP flags can be changed with the aid of a MOV instruction using
CR0 as the destination operand and read with the aid of a MOV instruction
with CR0 as the source operand. These forms of the MOV instruction can be
executed only at privilege level zero.
11.1.4 The Task-Switched Flag
11.1.4 The Task-Switched Flag
The TS bit of CR0 helps to determine when the context of the coprocessor
does not match that of the task being executed by the 80386 CPU. The 80386
sets TS each time it performs a task switch (whether triggered by software
or by hardware interrupt). If, when interpreting one of the ESC
instructions, the CPU finds TS already set, it causes exception 7. The WAIT
instruction also causes exception 7 if both TS and MP are set. Operating
systems can use this exception to switch the context of the coprocessor to
correspond to the current task. Refer to the 80386 System Software Writer's
Guide for an example.
The CLTS instruction (legal only at privilege level zero) resets the TS
flag.
11.1.5 Coprocessor Exceptions
11.1.5 Coprocessor Exceptions
Three exceptions aid in interfacing to a coprocessor: interrupt 7
(coprocessor not available), interrupt 9 (coprocessor segment overrun), and
interrupt 16 (coprocessor error).
11.1.5.1 Interrupt 7 -- Coprocessor Not Available
11.1.5.1 Interrupt 7 -- Coprocessor Not Available
This exception occurs in either of two conditions:
1. The CPU encounters an ESC instruction and EM is set. In this case,
the exception handler should emulate the instruction that caused the
exception. TS may also be set.
2. The CPU encounters either the WAIT instruction or an ESC instruction
when both MP and TS are set. In this case, the exception handler
should update the state of the coprocessor, if necessary.
11.1.5.2 Interrupt 9 -- Coprocessor Segment Overrun
11.1.5.2 Interrupt 9 -- Coprocessor Segment Overrun
This exception occurs in protected mode under the following conditions:
* An operand of a coprocessor instruction wraps around an addressing
limit (0FFFFH for small segments, 0FFFFFFFFH for big segments, zero for
expand-down segments). An operand may wrap around an addressing limit
when the segment limit is near an addressing limit and the operand is
near the largest valid address in the segment. Because of the
wrap-around, the beginning and ending addresses of such an operand
will be near opposite ends of the segment.
* Both the first byte and the last byte of the operand (considering
wrap-around) are at addresses located in the segment and in present and
accessible pages.
* The operand spans inaccessible addresses. There are two ways that such
an operand may also span inaccessible addresses:
1. The segment limit is not equal to the addressing limit (e.g.,
addressing limit is FFFFH and segment limit is FFFDH); therefore,
the operand will span addresses that are not within the segment
(e.g., an 8-byte operand that starts at valid offset FFFC will span
addresses FFFC-FFFF and 0000-0003; however, addresses FFFE and FFFF
are not valid, because they exceed the limit);
2. The operand begins and ends in present and accessible pages but
intermediate bytes of the operand fall either in a not-present page
or in a page to which the current procedure does not have access
rights.
The address of the failing numerics instruction and data operand may be
lost; an FSTENV does not return reliable addresses. As with the 80286/80287,
the segment overrun exception should be handled by executing an FNINIT
instruction (i.e., an FINIT without a preceding WAIT). The return address on
the stack does not necessarily point to the failing instruction nor to the
following instruction. The failing numerics instruction is not restartable.
Case 2 can be avoided by either aligning all segments on page boundaries or
by not starting them within 108 bytes of the start or end of a page. (The
maximum size of a coprocessor operand is 108 bytes.) Case 1 can be avoided
by making sure that the gap between the last valid offset and the first
valid offset of a segment is either no less than 108 bytes or is zero (i.e.,
the segment is of full size). If neither software system design constraint
is acceptable, the exception handler should execute FNINIT and should
probably terminate the task.
11.1.5.3 Interrupt 16 -- Coprocessor Error
11.1.5.3 Interrupt 16 -- Coprocessor Error
The numerics coprocessors can detect six different exception conditions
during instruction execution. If the detected exception is not masked by a
bit in the control word, the coprocessor communicates the fact that an error
occurred to the CPU by a signal at the ERROR# pin. The CPU causes interrupt
16 the next time it checks the ERROR# pin, which is only at the beginning of
a subsequent WAIT or certain ESC instructions. If the exception is masked,
the numerics coprocessor handles the exception according to on-board logic;
it does not assert the ERROR# pin in this case.
11.2 General Multiprocessing
11.2 General Multiprocessing
The components of the general multiprocessing interface include:
* The LOCK# signal
* The LOCK instruction prefix, which gives programmed control of the
LOCK# signal.
* Automatic assertion of the LOCK# signal with implicit memory updates
by the processor
11.2.1 LOCK and the LOCK# Signal
11.2.1 LOCK and the LOCK# Signal
The LOCK instruction prefix and its corresponding output signal LOCK# can
be used to prevent other bus masters from interrupting a data movement
operation. LOCK may only be used with the following 80386 instructions when
they modify memory. An undefined-opcode exception results from using LOCK
before any instruction other than:
* Bit test and change: BTS, BTR, BTC.
* Exchange: XCHG.
* Two-operand arithmetic and logical: ADD, ADC, SUB, SBB, AND, OR, XOR.
* One-operand arithmetic and logical: INC, DEC, NOT, and NEG.
A locked instruction is only guaranteed to lock the area of memory defined
by the destination operand, but it may lock a larger memory area. For
example, typical 8086 and 80286 configurations lock the entire physical
memory space. The area of memory defined by the destination operand is
guaranteed to be locked against access by a processor executing a locked
instruction on exactly the same memory area, i.e., an operand with
identical starting address and identical length.
The integrity of the lock is not affected by the alignment of the memory
field. The LOCK signal is asserted for as many bus cycles as necessary to
update the entire operand.
11.2.2 Automatic Locking
11.2.2 Automatic Locking
In several instances, the processor itself initiates activity on the data
bus. To help ensure that such activities function correctly in
multiprocessor configurations, the processor automatically asserts the LOCK#
signal. These instances include:
* Acknowledging interrupts.
After an interrupt request, the interrupt controller uses the data bus
to send the interrupt ID of the interrupt source to the CPU. The CPU
asserts LOCK# to ensure that no other data appears on the data bus
during this time.
* Setting busy bit of TSS descriptor.
The processor tests and sets the busy-bit in the type field of the TSS
descriptor when switching to a task. To ensure that two different
processors cannot simultaneously switch to the same task, the processor
asserts LOCK# while testing and setting this bit.
* Loading of descriptors.
While copying the contents of a descriptor from a descriptor table into
a segment register, the processor asserts LOCK# so that the descriptor
cannot be modified by another processor while it is being loaded. For
this action to be effective, operating-system procedures that update
descriptors should adhere to the following steps:
-- Use a locked update to the access-rights byte to mark the
descriptor not-present.
-- Update the fields of the descriptor. (This may require several
memory accesses; therefore, LOCK cannot be used.)
-- Use a locked update to the access-rights byte to mark the
descriptor present again.
* Updating page-table A and D bits.
The processor exerts LOCK# while updating the A (accessed) and D
(dirty) bits of page-table entries. Also the processor bypasses the
page-table cache and directly updates these bits in memory.
* Executing XCHG instruction.
The 80386 always asserts LOCK during an XCHG instruction that
references memory (even if the LOCK prefix is not used).
11.2.3 Cache Considerations
11.2.3 Cache Considerations
Systems programmers must take care when updating shared data that may also
be stored in on-chip registers and caches. With the 80386, such shared
data includes:
* Descriptors, which may be held in segment registers.
A change to a descriptor that is shared among processors should be
broadcast to all processors. Segment registers are effectively
"descriptor caches". A change to a descriptor will not be utilized by
another processor if that processor already has a copy of the old
version of the descriptor in a segment register.
* Page tables, which may be held in the page-table cache.
A change to a page table that is shared among processors should be
broadcast to all processors, so that others can flush their page-table
caches and reload them with up-to-date page tables from memory.
Systems designers can employ an interprocessor interrupt to handle the
above cases. When one processor changes data that may be cached by other
processors, it can send an interrupt signal to all other processors that may
be affected by the change. If the interrupt is serviced by an interrupt
task, the task switch automatically flushes the segment registers. The task
switch also flushes the page-table cache if the PDBR (the contents of CR3)
of the interrupt task is different from the PDBR of every other task.
In multiprocessor systems that need a cacheability signal from the CPU, it
is recommended that physical address pin A31 be used to indicate
cacheability. Such a system can then possess up to 2 Gbytes of physical
memory. The virtual address range available to the programmer is not
affected by this convention.