| Intel originally
included LOADALL in the CPU mask for
testing purposes and In Circuit Emulator (ICE)
support. As its name implies, LOADALL loads
all of the CPU registers, including the
"hidden" software-invisible registers.
At the completion of a LOADALL instruction,
the entire CPU state is defined according to the LOADALL
data table. LOADALL loads all of the
software-visible registers such as AX, and
all of the software-invisible registers such as
the segment
descriptor caches. By manipulating the
descriptor cache base registers, you can access
the entire address space without switching to
protected mode. In other words, by using LOADALL,
you can access memory above 1Mb from real
mode. Since the alternative method for the 286
(switching to protected mode, accessing the
desired memory, then resetting the CPU - the only
way to get the 286 back to real mode) has a
significant performance penalty, LOADALL is
most significant to 286 programmers. LOADALL provides
them with a new capability that is not available
by any other means.
LOADALL Details
LOADALL is closely coupled with the CPU
hardware. Both the 286 and 386 have different
internal hardware and Intel implemented LOADALL
using different opcodes on the 286 and 386.
80286 LOADALL (opcode 0F05) produces an
invalid opcode exception when executed on the
386, and 80386 LOADALL (opcode 0F07)
produces an invalid opcode exception when
executed on the 286.
LOADALL loads all CPU registers
(including MSW, GDTR, CSBASE, ESACCESS) from
a memory image. You can execute LOADALL in
real or protected mode, but only at privilege
level 0 (CPL=0). If you execute LOADALL at
any other privilege level, the CPU generates an
exception.
By directly loading the descriptor cache
registers with LOADALL, a program has
explicit control over the base address, segment
limit, and access rights associated with each
memory segment. Normally, the CPU loads these
values each time it loads a segment register, but
LOADALL allows you to load these hidden
registers independently of their segment register
counterparts.
In real mode, LOADALL makes it possible
to access a memory segment that is not associated
with any segment register. Likewise in protected
mode, you can access memory that has no
descriptor table entry.
LOADALL performs no protection checks
against any of the loaded register values. When
you execute it at CPL 0, LOADALL can
generate no exceptions. The segment access rights
and limit portions may be values that would
otherwise be illegal in the context of real mode
or protected mode, but LOADALL willingly
loads these values with no checks. Once loaded,
however, the CPU performs full access checks when
accessing a segment. For example, you can load a
segment whose access is marked "not
present." Normally, this condition would
generate exception 11, "segment not
present", but LOADALL does not
generate exception 11. Instead, any attempt to
access this segment will generate exception 13.
LOADALL does not check coherency
between the software-visible segment registers
and the software-invisible segment descriptor
cache registers. Any segment descriptor base
register may point to any area in the CPU address
space, while the software-visible segment
register may contain any other arbitrary value.
The CPU makes all memory references according to
the descriptor cache registers, not the
software-visible segment registers. All
subsequent segment register loads will reload the
descriptor cache register. Beware of using values
in CS that do not perfectly match a code segment
descriptor table entry, or a real mode code
segment - an interrupt return (IRET) may
either cause an exception or execution to resume
at an unexpected location. Likewise, pushing and
subsequently popping any segment register will
force the descriptor cache register to reload
according to the CPU's conventional protocol,
thereby inhibiting any further real mode extended
memory references.
80286 LOADALL
You encode the 80286 LOADALL as a
two-byte opcode, 0F05h. LOADALL reads its
table from a fixed memory location at 800h (80:0
in real-mode addressing). LOADALL performs
51 bus cycles (WORD cycles), and takes 195
clocks with no wait states. Table
1 shows the format you must prepare at
location 800h before executing the 286 LOADALL
instruction. All CPU register entries in the LOADALL
table conform to the standard Intel format,
where the least significant byte is at the lowest
memory address. Table 2
shows the 286 format of the descriptor cache
entries.
Table 1 -- 80286 LOADALL
Table
| Physical Address |
Description |
Data Size |
Data Value |
[800]
[802]
[804]
[806]
[808]
[80A]
[80C]
[80E]
[810]
[812]
[814]
[816]
[818]
[81A]
[81C]
[81E]
[820]
[822]
[824]
[826]
[828]
[82A]
[82C]
[82E]
[830]
[832]
[834]
[836]
[83C]
[842]
[848]
[84E]
[854]
[85A]
[860]
[866]
|
None
None
MSW
None
None
None
None
None
None
None
None
TR_REG
FLAGS
IP
LDT_REG
DS_REG
SS_REG
CS_REG
ES_REG
DI
SI
BP
SP
BX
DX
CX
AX
ES_DESC
CS_DESC
SS_DESC
DS_DESC
GDT_DESC
LDT_DESC
IDT_DESC
TSS_DESC
ENT OF TABLE |
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DW
DESC_CACHE286
DESC_CACHE286
DESC_CACHE286
DESC_CACHE286
DESC_CACHE286
DESC_CACHE286
DESC_CACHE286
DESC_CACHE286
|
0
0
0
?
0
0
0
0
0
0
0
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
|
DESC_CACHE286 STRUC
Addr_A15_A00 DW ?
Addr_A23_A16 DB ?
Access DB ?
Limit DW ?
ENDS
|
Intel recommends some
guidelines for proper execution following LOADALL.
The stack segment should be a read/write data
segment; the code segment can be execute on1y
(access=95h), read/execute (access=9bh), or
read/write/execute (access=93h). Proper protected
mode operation also requires that the DPL of CS
and DPL of SS be equal. These
attributes determine the CPL of the processor.
Also, the DPL fields of ES and DS should
be equal to 3 to prevent RETF or IRET instructions
from zeroing these registers.
The code in listing
1 demonstrates how to exp1ore the various
operating modes with 286 LOADALL and how
to access extended memory while in real mode. The
LOADALL test performs various functions
that would be impossible to duplicate without
using LOADALL.
80386 LOADALL
The 386 LOADALL is encoded as a two-byte
opcode (0F07). Unlike the 286 LOADALL, this
LOADALL instruction reads its data from a table
pointed to by ES:EDI. Segment overrides are
allowed, but apparently ignored. The 386 LOADALL
performs 51 bus cycles (DWORD cycles) and takes
122 clocks with no wait states. Table 3 shows the
386 LOADALL format. However, Table
3 does not show that prior to reading the
LOADALL table, LOADALL reads 10 DWORDs exactly
100h bytes beyond the beginning of the table
(ES:EDI+100h). This data is not used to load any
of the registers LOADALL does not load (CR2, CR3,
DRO-DR3, TR6, TR7), or the Numeric Processor
eXtension (NPX). At this time, the purpose of
reading this data and its destination is a
mystery. Figure 1 shows an ICE trace showing all
the bus cycles associated with LOADALL's
execution.
As with the 286 LOADALL, all CPU register
entries in the LOADALL table are in the standard
Intel format where the least significant byte is
at the lowest memory address. The 386 descriptor
cache entries have the format shown in Table 4.
Listing
2 shows how to test 386 LOADALL. This test is
more comprehensive than the 286 LOADALL test
because of the expanded capabilities of the 386
microprocessor. This test puts the CPU into
various states that are illegal and are
impossible to duplicate through any other
software means.
LOADALL Emulation
Due to the large number of systems programs
that use 286 LOADALL, all 386 and 486
BIOS's must emu1ate the 286 LOADALL instruction
(opcode 0F05). On the 386 and 486, the 286
LOADALL instruction generates an invalid
opcode exception. The BIOS traps this exception
and does its best to emulate the functionality of
the LOADALL instruction, but perfect
emulation is impossible without using LOADALL itself.
Using 386 LOADALL to emulate 286 LOADALL
can be done, but has its risks. First of all,
the 486 does not have a LOADALL instruction.
Second, Intel has threatened to remove LOADALL
from the 386 mask.
Perfect emulation is possible on the 386 by
using 386 LOADALL to emulate 286 LOADALL.
Listing
3 shows a TSR program that uses 386 LOADALL
to emulate 286 LOADALL. The program
first tests that you are a 386 before insta1ling
itself. By using this emu1ation program, you can
guarantee perfect 286 LOADALL emulation.
Conclusion
LOADALL is a very powerful instruction,
but the features that make it so powerful also
make it risky. For example, LOADALL can
put the processor in states that are otherwise
impossible to duplicate through any other
software means. Using LOADALL requires a
thorough understanding of how the CPU processes
register loads, the ramifications of those
register loads, and careful planning. The
illegally induced processor states can easily
cause system crashes if not properly planned for.
The best way to avoid system crashes is to avoid
using LOADALL unless you are totally
confident in your understanding of the CPU and in
your programming skills.
The 286 LOADALL is described in a
15-page Intel-confidential document The document
describes in detail how to use the instruction,
and also describes many of its possible uses. LOADALL
can be used to access extended memory while
in real mode, and to emulate real mode while in
protected mode. Programs such as RAMDRIVE,
ABOVEDISC, and OS/2 use LOADALL. DOS 3.3
has provisions for using LOADALL by
leaving a 102-byte 'hole' at 80:0. If you are a
systems programmer and have a need to know this
information, Intel will provide it, along with
source code to emulate 286 LOADALL on the
386 (without using 386 LOADALL).
Unlike the 286 LOADALL, the 386 LOADALL
is still an Intel top secret. l do not know
of any document that describes its use, format,
or acknowledges its existence. Very few people at
Intel wil1 acknowledge that LOADALL even
exists in the 80386 mask. The official Intel line
is that, due to U.S. Military pressure, LOADALL
was removed from the 80386 mask over a year
ago. However, running the program in Listing-2
demonstrates that LOADALL is alive, well,
and still available on the latest stepping of the
80386.
View source code for 286 LOADALL:
ftp://ftp.x86.org/source/286load/286load.asm
ftp://ftp.x86.org/source/286load/loadfns.286
ftp://ftp.x86.org/source/286load/macros.286
ftp://ftp.x86.org/source/include/cpu_type.asm
View source code for 386 LOADALL:
ftp://ftp.x86.org/source/386load/386load.asm
ftp://ftp.x86.org/source/386load/loadfns.386
ftp://ftp.x86.org/source/386load/macros.386
ftp://ftp.x86.org/source/include/cpu_type.asm
View source code for EMULOAD (286
LOADALL emulation using 386 LOADALL):
ftp://ftp.x86.org/source/emuload/emuload.asm
ftp://ftp.x86.org/source/include/cpu_type.asm
Download entire source code
archive for 286LOAD, 386LOAD, and EMULOAD:
ftp://ftp.x86.org/dloads/LOADALL.ZIP
|
DESCRIPTOR CACHE REGISTERS
Whether in real or protected mode, the
CPU stores the base address of each
segment in hidden registers called
descriptor cache registers. Each time the
CPU loads a segment register, the segment
base address, segment size limit, and
access attributes (access rights) are
loaded, or "cached," ) into
these hidden registers. To enhance
performance, the CPU makes all subsequent
memory references via the descriptor
cache registers instead of calculating
the physical address, or looking up the
base address in the descriptor table.
Understanding the role of these hidden
registers is paramount for exploiting
highly advanced programming techniques,
and for exploiting the undocumented
LOADALL instruction.Figure
2(a) shows the descriptor cache
layout for the 80286, and Figure 2(b) shows the
layout for the 80386, and 80486.
Figure
2 (a) 80286 Descriptor Cache Register
| [47..32] |
31 |
[30..29] |
28 |
[27..25] |
24 |
[23..00] |
| 16-bit
Limit |
P |
DPL |
S |
Type |
A |
24-bit
base address |
Figure
2 (b) 80386/80486 Descriptor
Cache Register
| [31..24] |
23 |
[22..21] |
20 |
[19..17] |
16 |
15 |
14 |
[13..00] |
| 0 |
P |
DPL |
S |
Type |
A |
0 |
D |
0 |
|
| [63..32] |
| 32-bit
Physical Address |
|
|
At power-up, the
descriptor cache registers are loaded
with fixed, default values, the CPU is in
real mode, and all segments are marked as
read/write data segments, including the
code segment (CS). According to Intel,
each time the CPU loads a segment
register in real mode, the base address
is 16 times the segment value, while the
access rights and size limit attributes
are given fixed, "real-mode
compatible" values. This is not
true. In fact, only the CS descriptor
cache access rights get loaded with fixed
values each time the segment register is
1oaded - and even then only when a far
jump is encountered. Loading any other
segment register in real mode does not
change the access rights or the segment
size limit attributes stored in the
descriptor cache registers. For these
segments, the access rights and segment
size limit attributes are honored from
any previous setting (see Figure
3). Thus it is possible to have a
four giga-byte, read-only data segment in
real mode on the 80386, but Intel will
not acknowledge, or support this mode of
operation.
Protected mode differs from real mode
in this respect each time the CPU loads a
segment register, it fully loads the
descriptor cache register, no previous
values are honored. The CPU loads the
descriptor cache directly from the
descriptor table. The CPU checks the
validity of the segment by testing the
access rights in the descriptor table,
and illegal va1ues will generate
exceptions. Any attempt to load CS with a
read/write data segment will generate a
protection error. Likewise, any attempt
to 1oad a data segment register as an
executable segment will also generate an
exception. The CPU enforces these
protection rules very strictly if the
descriptor table entry passes all the
tests, then the CPU loads the descriptor
cache register.
Figure
3 -- Descriptor Cache Contents (Real
Mode)

|
Table 2 (a) -- 80286 Descriptor Cache Entry
Formats
| Offset
|
Description |
| 0-2 |
24-bit physical address of the
segment in memory. These bytes are stored
in standard Intel format with the least
significant byte at the lowest memory
address. |
| 3 |
Access rights. The format of this
byte is the same as that in the
descriptor table. This access byte is
loaded in the descriptor cache register
regardless of its validity. Therefore the
"present" bit in the access
rights field becomes a "descriptor
valid" bit. When this bit is
cleared, the descriptor is considered
invalid, and any memory reference using
this descriptor generates exception 13,
with error code 0. The Descriptor
Privilege Level (DPL) of the SS and CS
descriptor caches determines the Current
Privilege Level (CPL). The CS descriptor
cache may be loaded as a read/write data
segment. |
| 4-5 |
Segment limit. The standard 16-bit
segment limit stored in standard Intel
format. |
Table 2 (b) -- 80286 GDT and IDT Descriptor
Cache Entry Formats
| Offset |
Description |
| 0-2 |
24-bit physical address of the
segment in memory. |
| 3 |
Should be 0. |
| 4-5 |
Segment limit. |
Table 3 -- 80386 LOADALL
Table
| Offset |
Description |
Data Size |
Data Value |
[00]
[04]
[08]
[0C]
[10]
[14]
[18]
[1C]
[20]
[24]
[28]
[2C]
[30]
[34]
[38]
[3C]
[40]
[44]
[48]
[4C]
[50]
[54]
[60]
[6C]
[78]
[84]
[90]
[9C]
[A8]
[B4]
[C0]
[CC]
|
CR0
EFLAGS
EIP
EDI
ESI
EBP
ESP
EBX
EDX
ECX
EAX
DR6
DR7
TR_REG
LDT_REG
GS_REG
FS_REG
DS_REG
SS_REG
CS_REG
ES_REG
TSS_DESC
IDT_DESC
GDT_DESC
LDT_DESC
GS_DESC
FS_DESC
DS_DESC
SS_DESC
CS_DESC
ES_DESC
LENGTH OF TABLE |
DD
DD
DD
DD
DD
DD
DD
DD
DD
DD
DD
DD
DD
REG_STRUC
REG_STRUC
REG_STRUC
REG_STRUC
REG_STRUC
REG_STRUC
REG_STRUC
REG_STRUC
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
DESC_CACHE
|
?
?
?
?
?
?
?
?
?
?
?
?
<?>
<?>
<?>
<?>
<?>
<?>
<?>
<?>
<?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
<?,?,?>
|
REG_STRUC STRUC
REG_VAL DW ?
DW 0
ENDS
|
DESC_CACHE STRUC
DB 0
_Type DB ?
DB 0
DB 0
_Addr DD ?
_Limit DD ?
ENDS
|
Table 4 (a) -- 80386 Descriptor Cache Entries
| Offset
|
Description |
| 0-3 |
Access rights. The access rights
dword consumes 11 bits of this 32-bit
field. See figure
2 for a complete description of this
field. |
| 4-7 |
32-bit base address of the segment in
memory.. |
| 8-11 |
32-bit base address of the segment in
memory. |
Table 4 (b) --
80386 GDT and IDT Descriptor Cache Entry
Formats
| Offset
|
Description |
| 0-3 |
Should be 0. |
| 4-7 |
32-bit base address of GDTR or IDTR. |
| 8-11 |
32-bit limit of GDTR or IDTR. |
Figure 1 --
In-Circuit-Emulator Trace of 80386 LOADALL
Instruction
| Frame |
The FRAME number is like a clock
count for the CPU. At every CPU clock,
the ICE takes a picture. When a valid
cycle occurs, the ICE records its
occurance. Therefore, it is possible to
determine how many CPU clocks a sequence
of instructions takes to execute by
reading this information. |
| Type |
Cycle type. Shown here are
F=Fetch, R=Read, and X=eXecute. |
| Address |
The 32-bit physical address
asserted on the CPU address bus during
each cycle. |
| Data |
The data
asserted on the CPU data bus during each
cycle. |
BE3#
BE2#
BE1#
BE0# |
Byte
enable pins on the CPU. These pins
determine which bytes of the 32-bits of
data are valid. These pins are active
low, so 8-bits of data are valid for each
'0.' |
| W/R# |
Write/Read. |
Write = 1 |
Read = 0 |
| D/C# |
Data/Code. |
Data = 1 |
Code = 0 |
| M/IO# |
Memory/IO |
Memory = 1 |
IO = 0 |
Frame
Dec
|
Type
|
Address
(Hex)
|
Data
(Hex)
|
BBBB
EEEE
3210
####
|
WDM
///
RCI
O
###
|
Comments
|
5
8
011
013
015
017
019
021
023
025
027
029
031
033
035
037
039
041
043
045
047
049
051
053
055
057
059
061
063
065
067
069
071
073
075
077
079
081
083
085
087
089
091
093
095
097
099
101
103
105
107
109
111
113
115
117
119
121
123
125
127
129
131
|
F
X
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
|
0000DE40
executed
0000D8F0
0000D8F4
0000D8F8
0000D8FC
0000D900
0000D904
0000D908
0000D90C
0000D910
0000D914
0000D7F0
0000D7F4
0000D7F8
0000D7FC
0000D800
0000D804
0000D808
0000D80C
0000D810
0000D814
0000D818
0000D81C
0000D820
0000D824
0000D828
0000D82C
0000D830
0000D834
0000D838
0000D83C
0000D840
0000D844
0000D848
0000D84C
0000D850
0000D854
0000D858
0000D85C
0000D860
0000D864
0000D868
0000D86C
0000D870
0000D874
0000D878
0000D87C
0000D880
0000D884
0000D888
0000D88C
0000D890
0000D894
0000D898
0000D89C
0000D8A0
0000D8A4
0000D8A8
0000D8AC
0000D8B0
0000D8B4
0000D8B8
|
B490070F
2 bytes
01010101
02020202
03030303
04040404
05050505
06060606
07070707
08080808
09090909
0A0A0A0A
7FFFFFE0
00000002
00000133
66666666
77777777
55555555
88888888
22222222
44444444
33333333
11111111
FFFF0FF0
0000D402
xxxx0000
xxxx0000
xxxx5555
xxxx4444
xxxx2222
xxxx6666
xxxx1111
xxxx3333
00008900
00070000
00000800
00000000
00000000
000003FF
00000000
00000000
00000000
00008200
00090000
00000088
00008300
00050000
0000FFFF
00009300
00040000
0000FFFF
00009300
00020000
0000FFFF
00009300
00060000
0000FFFF
00009B00
0000DD30
0000FFFF
00009300
00030000
00FFFFFF
|
0000
at
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
1100
1100
1100
1100
1100
1100
1100
1100
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
0000
|
001
DE40L
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
011
|
LOADALL fetched
LOADALL begins execution
\
\
\ The10"mystery"
\ reads,exactly
\ 100h bytes beyond
/ the beginning of
/ the LOADALL table.
/
/
/
CR0
EFLAGS
EIP
EDI
ESI
EBP
ESP
EBX
EDX
ECX
EAX
DR6
DR7
TR Register
LDT Register
GS Register
FS Register
DS Register
SS Register
CS Register
ES Register
TSS DescriptorCache
IDT DescriptorCache
GDT DescriptorCache
LDT DescriptorCache
GS DescriptorCache
FS DescriptorCache
DS DescriptorCache
SS DescriptorCache
CS DescriptorCache
ES DescriptorCache
|
|