SYSDOC HPPA                                    Robert Duncan, April 1993

Porting Poplog to the HP PA-RISC 1.1


         CONTENTS - (Use <ENTER> g to access required sections)

  1   Architectural Background

  2   Assembling and Linking

  3   Register Usage

  4   Procedure Call and Return

  5   The Callstack

  6   External Calls

  7   Signal Handling

  8   Documentation, Utilities etc.


------------------------------------------------------------------------
1  Architectural Background
------------------------------------------------------------------------

The HP Precision Architecture has many similarities with other RISC
processors which already support Poplog, such as SPARC and MIPS. These
can be summarised as:

    o   load/store architecture

    o   byte addressable

    o   32-bit word length

    o   32, 32-bit general registers (GR[0..31])

    o   32, 64-bit IEEE floating-point registers (FR[0..31])

    o   word-length instructions

    o   branch delay slots

Data must be ``naturally aligned'' for size -- words on a 4-byte
boundary, doubles on an 8-byte boundary etc. Addresses are big-endian,
pointing to the most significant byte of a datum. This principal extends
to bit numbering, so that bit 0 is always the most significant bit, e.g.
the sign bit in an integer. The Poplog tag bits are thus bit numbers 30
and 31.

The instruction set is a bit weird, and probably unreadable to the
uninitiated, but practice shows it to be quite well thought out.
Features which are to Poplog's advantage include:

    o   post-increment and pre-decrement addressing modes which
        directly support stack operations

    o   comprehensive range of condition codes for branches etc.,
        including bit tests

    o   the nullify bit on arithmetic and branch instructions causes
        the following instruction to be executed as a no-op if a
        condition is satisfied: this can reduce the number of no-ops
        occurring in branch delay slots, and can sometimes eliminate the
        need for a branch all together

One disadvantage is the lack of a division instruction (although there
is a divide step instruction which speeds up an assembly-code division
algorithm); and the single (unsigned!) multiply instruction is
implemented by the floating-point unit which makes it awkward to use.

As on the MIPS, there are separate caches for instructions and data, and
code written to the data space must be flushed from both (see cacheflush
in "amain.s").

By far and away the biggest distinguishing feature of the Precision --
one which haunts the whole Poplog implementation -- is that the address
space is segmented. Despite the 32-bit word length, virtual addresses
are actually 64 bits in size, composed from two 32-bit parts: a space
identifier and an offset. The general registers, when used for memory
addressing, hold just the offset part; the space identifiers are held in
8 dedicated space registers (SR[0..7]). Memory spaces have their access
rights policed by hardware, and this is presumably the primary point of
it (it's certainly of no use to programmers). HP-UX allocates four
distinct spaces to each process:

    o   read-only, shared text

    o   private data (including the call stack)

    o   shared memory (including shared libraries)

    o   privileged system code

Every memory-referencing instruction -- load, store or branch -- must
specify a space register, either implicitly or explicitly. Implicit mode
is most relevant to load/store instructions: in this mode, the top two
bits of the offset part of the address are used to identify the space
register, based on the mapping:

    00  -->  SR[4]
    01  -->  SR[5]
    10  -->  SR[6]
    11  -->  SR[7]

This does, of course, restrict the range of addresses to a 30-bit
quadrant within the space. Fortunately, HP-UX memory mapping is based
around this addressing mode, using these four space registers to hold
the identifiers of the four process spaces:

    SR[4] = shared text
    SR[5] = private data
    SR[6] = shared memory
    SR[7] = system code

and the offsets within the spaces are set by the linker to lie within
their associated quadrants. This means that for most purposes, a process
sees a standard 32-bit address space, as follows:

    ---------------------
    |   Text            | 16:00000000
    .                   .
    .                   .
    |                   | 16:3FFFFFFC
    |-------------------|
    |   Data            | 16:40000000
    .                   .
    .                   .
    |                   | 16:7FFFFFFC
    |-------------------|
    |   Shared          | 16:80000000
    .   Memory          .
    .                   .
    |                   | 16:BFFFFFFC
    |-------------------|
    |   System          | 16:C0000000
    .   Code            .
    .                   .
    |                   | 16:FFFFFFFC
    ---------------------

Branches and calls are different, because the branch instructions fall
into two distinct groups: local (intra-space) branches, which compute
their targets relative to the space of the instruction itself, and
external (inter-space) branches which require an explicit space register
to be specified for the target. You cannot make an external branch using
an implicit space register, and it's this which causes the most
difficulties for Poplog.

One other curious feature of the processor's execution model is that
whenever control is transferred to an absolute address (i.e. by
branching through a register) then the two least-significant bits of the
target offset are interpreted as encoding the privilege level at which
the code should be executed. These two bits would otherwise be unused of
course, because code is always word-aligned. There are four privilege
levels, from 0 the highest to 3 the lowest (standard) level. With normal
branches the privilege level can only decrease, and any attempt to raise
the level is ignored; only the special gate instruction can raise the
privilege level. The matter is relevant to Poplog because the return
address offset deposited by a branch-and-link instruction always has
these two low bits set, to ensure that the current privilege level is
restored on return. Since Poplog code always executes at level 3, return
addresses look like pop integers!


------------------------------------------------------------------------
2  Assembling and Linking
------------------------------------------------------------------------

The assembler uses the symbols %r0-%r31, %sr0-%sr7 and %fr0-%fr31 to
denote the general registers, space registers and floating point
registers respectively. The floating point registers can have their
upper and lower 32-bit halves addressed separately -- for single float
or fixpoint values -- by suffixing the register name with L or R as
appropriate. There are also more mnemonic names defined for most of the
general registers which relate to their function in the standard
procedure calling convention (see below) such as %sp for the stack
pointer and %arg0 for the first subroutine argument register.

The file "asm_macros.h" defines some additional register names for
Poplog's own use, such as %usp for the user stack pointer and %pb for
the procedure base register. This file is included in all the hand-coded
assembler files. It also defines several assembler macros for common
operations, such as STV32 for storing a value to a 32-bit symbolic
address; these are always written in upper case to distinguish them from
real instructions (although instruction names are not case sensitive).
Assembly code files generated by POPC don't use the "asm_macros" header
file, but will define and use the Poplog register names if the flag
M_DEBUG is set <true> in "sysdefs.p".

As with other RISC processors, the fixed instruction length makes it
impossible to manipulate a 32-bit value with a single instruction. As on
the SPARC, the assembler provides special operators -- L' and R' (or L%
and R%) -- to extract the upper (21 bits) and lower (11 bits) parts of a
32-bit value. Curiously, the 11-bit R value is still too big to be the
operand of an arithmetic immediate instruction, but is acceptable as the
displacement part of a load/store, so to load a 32-bit value to a
register we use:

        ldil        L'value, %reg
        ldo         R'value(%reg), %reg

This is the same as the LDA32 macro defined in "asm_macros.h".

The assembler's .export directive marks a symbol as being visible from
outside the current file (like .globl on other Unix systems).
Unfortunately, there is also a matching .import directive which must be
used to declare symbols which are referenced in the file but defined
elsewhere. Failure to do this generates "undefined label" errors. In the
hand-written assembly code files, these import directives can be
inserted by hand; for POPC output, they have to be done automatically.
This is accomplished by having two properties defined in "asmout.p"
which record all symbol definitions and references within the current
file; at the end of the file, all symbols used but not defined are
imported.

Worse still, symbols are exported and imported with "types" which are
meaningful to the linker. There are several legal type keywords, but the
common ones are code and data; by default, a symbol gets the type of the
space in which the export/import directive was placed. The documentation
is very unclear as to what these types really mean; however, the type at
which a symbol is imported into a file must match the type with which it
was exported or it remains as an undefined symbol at link time. The
linker reports such symbols as:

    /bin/ld: Unsatisfied symbols:
        foo (data)
        baz (code)

This is a problem for POPC, because it is impossible to deduce from a
declaration such as

    constant foo;

whether the structure foo is writeable (in data space) or non writeable
(in code space). The adopted solution is to export and import all Poplog
symbols as data regardless of the space in which they are actually
defined. This instantly makes everything consistent, and it does appear
to work. However, because the documentation on these types is so poor,
it's not *guaranteed* to work. It's quite possible that there should be
more information about executable symbols which the linker would
normally attach to code symbols which is being lost to us.

This solution doesn't work for external symbols referenced with _extern.
Such symbols already have types defined in the libraries from which
they're extracted, and it's impossible to deduce those from the manner
of the symbols' use within Poplog. This is insoluble in general without
some extra syntax to declare external symbols. Our partial solution is
to assume that all externs are code symbols (the usual case, for system
calls etc.) and then make exception for a fixed number of data symbols
which are listed individually in "asmout.p". This scheme will break as
soon as somebody adds a reference to a global variable and forgets to
update "asmout.p" accordingly. This is a manageable problem within
Poplog development work, but makes _extern unusable as a general user
feature (e.g. with POPC).

Poplog executables are linked to use shared libraries. This is the
default for ld(1) anyway, but the a.out file format for the 9000/700 is
so complicated that we've made no attempt to produce a version of
external load which works in the traditional way, but only one based on
the dynamic linking facilities described in shl_load(3X) (this is
currently enabled with the SHARED_LIBRARIES flag in "sysdefs.p" --
trying to build a system without that won't work). Use of these
facilities means that the executable has to be dynamically linked,
because there's no static archive version of the required library (dld).


------------------------------------------------------------------------
3  Register Usage
------------------------------------------------------------------------

The following general registers are constrained by hardware:

    %r0         permanent zero: reads always return 0 and writes are
                ignored

    %r1         implicit destination operand of the addil instruction

    %r31        implicit return-address operand of the ble instruction

Registers %r1 and %r31 are available for use as temporaries when not
required for their associated instructions.

Two further registers are reserved globally by software convention:

    %r27        global data pointer

    %r30        stack pointer

and the remainder have the following functions assigned by the procedure
calling conventions:

    %r2         local return link

    %r3-%r18    callee-saves partition

    %r19-%r22   caller-saves partition

    %r23-%r26   subroutine arguments

    %r28-%r29   subroutine results

The main Poplog registers are allocated from the callee-saves partition,
to prevent them being modified by external code. These include the usual
user stack pointer, procedure base register and false register; we also
dedicate one register to the special var block (as on the SPARC) and one
to popint 0 (= 3). The remainder are divided between pop and non-pop
register lvars, with a fairly arbitrary division of 6 pop to 5 non-pop.

Of the other registers, Poplog's procedure calling convention uses %r31
rather than %r2 as the return link (see below) and %r1 is used as the
chain register. This is summarised in the following table:

          -----------------------------------------------------
          |  Reg.  |  Name      |  Usage                      |
          |--------+------------+-----------------------------|
          |  %r0   |  0         |  Permanent 0                |
          |  %r1   |  %chain    |  Chain reg.                 |
          |  %r2   |  %rp       |  Local return link          |
          |  %r3   |  %npop4    |  Non-pop lvar               |
          |  %r4   |  %npop3    |  Non-pop lvar               |
          |  %r5   |  %npop2    |  Non-pop lvar               |
          |  %r6   |  %npop1    |  Non-pop lvar               |
          |  %r7   |  %npop0    |  Non-pop lvar               |
          |  %r8   |  %pop5     |  Pop lvar                   |
          |  %r9   |  %pop4     |  Pop lvar                   |
          |  %r10  |  %pop3     |  Pop lvar                   |
          |  %r11  |  %pop2     |  Pop lvar                   |
          |  %r12  |  %pop1     |  Pop lvar                   |
          |  %r13  |  %pop0     |  Pop lvar                   |
          |  %r14  |  %pzero    |  Permanent pop 0 (3)        |
          |  %r15  |  %false    |  Permanent false            |
          |  %r16  |  %svb      |  Special var block pointer  |
          |  %r17  |  %pb       |  Procedure base register    |
          |  %r18  |  %usp      |  User stack pointer         |
          |  %r19  |  %t4       |  Temporary                  |
          |  %r20  |  %t3       |  Temporary                  |
          |  %r21  |  %t2       |  Temporary                  |
          |  %r22  |  %t1       |  Temporary                  |
          |  %r23  |  %arg3     |  Subroutine argument        |
          |  %r24  |  %arg2     |  Subroutine argument        |
          |  %r25  |  %arg1     |  Subroutine argument        |
          |  %r26  |  %arg0     |  Subroutine argument        |
          |  %r27  |  %dp       |  Global data pointer        |
          |  %r28  |  %ret0     |  Subroutine result          |
          |  %r29  |  %ret1     |  Subroutine result          |
          |  %r30  |  %sp       |  Stack pointer              |
          |  %r31  |  %r31      |  Poplog return link         |
          -----------------------------------------------------

The more descriptive register names are either defined by the assembler
or by Poplog in the "asm_macros.h" file.

Of the space registers: %sr0 is used by Poplog as a temporary when
making inter-space calls; %sr4 and %sr5 are assumed always to hold the
space identifiers of the process' code and data spaces, and are used in
calls whenever the target space is known at compile time (see below).

Floating point registers are not generally used outside of "afloat.s",
except for operands to the instruction xmpyu (unsigned multiply) which
is implemented by the floating-point hardware.


------------------------------------------------------------------------
4  Procedure Call and Return
------------------------------------------------------------------------

A Poplog procedure may reside either in code space or in data space, so
the most general procedure call form has to use an external branch
instruction. For a procedure in a register (%arg0 say) this has the
form:

    ldw         _PD_EXECUTE(%arg0), %t1     ; execute address
    ldsid       (%arg0), %t2                ; space ID for procedure
    mtsp        %t2, %sr0                   ; copied to space register 0
    ble         (%sr0, %t1)
    nop

Knowing the procedure name (e.g from a constant procedure declaration)
doesn't necessarily help very much, because it still doesn't tell us
which space the procedure's in:

    ldil        L'xc$setpop, %t1
    ldsid       (%t1), %t2
    mtsp        %t2, %sr0
    ble         R'xc$setpop(%sr0, %t1)
    nop

There are three cases where we can know the target space at
code-generation time:

    (1) in system code, when the target is an assembly-code subroutine
        always defined in code space;

    (2) in system code, when the target is a procedure previously
        defined within the current file: a property defined in
        "asmout.p" records the space in which each procedure is
        generated;

    (3) in user code, when the procedure address is absolute: it could
        be a system procedure or a user procedure in a locked portion of
        the heap, but in either case the correct space can be determined
        by examining bits 0 and 1 of the address, relying on the
        implicit addressing conventions discussed earlier.

In these cases we still use an external branch, but with the appropriate
space register specified explicitly, relying on the HP-UX convention of
registers %sr4 and %sr5 corresponding to code and data spaces:

    ldil        L'x$L23, %t1            ; call lconstant procedure L23
    ble         R'x$L23(%sr4, %t1)      ; known to be in code space
    nop

In principal, for instances like this in system code, we could use a
local branch and save one instruction, but we can't guarantee that the
target will be in range and we can't afford to let the linker start
adding code stubs which could corrupt Poplog's stack layout.

The ble instruction deposits its return address offset part in %r31, so
we use this as the Poplog return address register. This is against the
standard HP calling conventions which specify %rp (= %r2) as the return
link. In general, a return must also be prepared to cross a space
boundary as follows:

    ldsid       (%r31), %t1             ; space ID of return address
    mtsp        %t1, %sr0               ; copied to space register 0
    be          (%sr0, %r31)
    nop

Note that a ble also deposits the return address space identifier in
%sr0, but the overhead of saving and restoring this in the procedure's
entry and exit code is greater than recomputing it dynamically at each
return point.

The return address offset is the only information communicated between
caller and callee. In particular, the caller does not promise to
pre-compute the target procedure address: the calling sequence is
complicated enough by the need to use an external branch without adding
this extra complexity. So any procedure which needs to know its own
address must compute it on entry. For a system procedure in code space
this is easy because the address is known and fixed:

    ldil        L'xc$setpop, %pb
    ldo         R'xc$setpop(%pb), %pb

For a relocatable procedure -- i.e. a user procedure or a copyable
system procedure -- we use the technique of doing a very local
branch-and-link to get a pointer to the executing code, and then adjust
that backwards to point at the procedure start:

    bl          L$1, %pb                ; sets %pb to the value of L$1
    ldo         -_SIZE(%pb), %pb
L$1

The _SIZE offset is roughly the size of the procedure header, plus 8 for
the first two instruction words. A minor wrinkle is caused by the fact
(discussed above) that the return address deposited by bl is not a pure
pointer, but includes the current privilege level encoded in the two
low-order bits. To avoid the cost of an extra instruction to clear these
two bits, we just assume that the privilege level is always 3 and adjust
the size accordingly. This will break if Poplog code is ever run at
anything other then the standard privilege level.


------------------------------------------------------------------------
5  The Callstack
------------------------------------------------------------------------

The system stack is located somewhere in the data space. The actual
start address is obtainable from the macro USRSTACK defined in
<sys/param.h> and copied into "sysdefs.p" as UNIX_USRSTACK. This changed
between HP-UX 8 and 9, so beware. The size of the stack area can't be
dynamically determined. An absolute upper limit is given by the macro
USRSTACKMAX, but this is misleading: the kernel has a soft limit maxssiz
which is typically less than this. The value of this limit is obtainable
for a particular machine by running sam(1M) but may vary between systems
(and can be changed by reconfiguring the kernel). The value in
"sysdefs.p" for UNIX_USRSTACK_SIZE is the value of maxssiz for the
machine we ported to.

A curiosity of the HP is that the stack grows up rather than down, and
the stack pointer points to the next free word rather than the last word
allocated. Assembly code which references the stack pointer should
beware of this. A new stack frame can be allocated and the first word
(typically the return address) stored with a single stwm instruction:

    stwm        %r31, _FRAME_SIZE(%sp)

Remaining words in the frame are then stored to fixed (negative)
offsets. A matching ldwm will deallocate the frame and restore the first
word.

The stack frame layout described by the procedure calling conventions is
not suitable for Poplog. It requires an amount of fixed space in each
frame for use by linker-generated stub code and for other
system-specific purposes. This might be coped with by defining an
HPPA-specific stack frame layout in "symdefs.p" with knock-on effects
elsewhere, but this doesn't seem worth it: the only real cost of using
non-standard frames is that the debugger can't provide a backtrace.
Interfacing to external routines is not a problem, because this is
restricted to well-defined points in the hand-coded assembler files, and
these ensure that a dummy stack frame satisfying the conventions is
constructed before any external call is made.

So HPPA Poplog stack frames have the standard form (shared by all other
systems except SPARC) except, of course, that the offsets are negative
rather than positive:

    SP ---> |                              |
            |------------------------------|
    SP-4 -> |  Owner address               |
            |------------------------------|
            |  Non-pop stack lvars         |
            |------------------------------|
            |  Pop stack lvars             |
            |------------------------------|
            |  Saved non-pop dlocals       |
            |------------------------------|
            |  Saved pop dlocals           |
            |------------------------------|
            |  Saved pop registers         |
            |------------------------------|
            |  Saved non-pop registers     |
            |------------------------------|
            |  Return address into caller  |
            |------------------------------|


Most of the differences resulting from this are handled by declaring
STACK_GROWS_UP in the "sysdefs.p" file which "inverts" the csword type,
so that stack offsets are automatically negated and have an extra 4
subtracted to account for the pointer position. The only case in the
HP-specific code not covered by this is in Get_opnd in "ass.p" which has
to interpret the encoding for on-stack lvars differently from normal.


------------------------------------------------------------------------
6  External Calls
------------------------------------------------------------------------

All calls to external routines are effected using the millicode routine
$$dyncall which is recommended for indirect calls. This expects the
procedure label (or plabel) of the target routine in register %t1. The
function of a plabel is not well documented, but it appears to denote a
structure containing the executable address of the routine plus an
optional linkage table pointer for shared library use. External routines
referenced by user code (through external load) are automatically
obtained as plabels via the dynamic linking mechanism. Those referenced
in system code (through _extern) need to be denoted specially in the
assembly code output from POPC using the plabel field selectors P', LP'
and RP'. These are generated by the appropriate routines from "asmout.p"
whenever a symbol is detected of type external code.

Use of $$dyncall simplifies the general external call interface
(_call_external defined in "aextern.s") with regard to the placing of
arguments into registers. An external function called directly may
expect its arguments in general registers or floating-point registers
depending on their type, and the allocation strategy done dynamically is
not easy. Using $$dyncall we assume that the executable part of the
plabel argument is actually a linker-generated stub which will relocate
arguments as necessary, so we simply ignore the floating-point registers
completely. The strategy used is to place all arguments into the stack
frame at their proper locations (taking care to align double-word
arguments correctly) and then, immediately prior to the call, to copy
the first four stack words -- whether or not they make sense -- into the
argument registers %arg0-%arg3. Similarly, any result from the call is
assumed to be returned in the general register pair (%ret0,%ret1) and
these values are copied into the Poplog result structure.

A stack frame allocated for an external call must satisfy the procedure
calling conventions which require a 48-byte fixed area, with the first
argument starting at offset -36; space has to be allocated for a minimum
of 4 argument words regardless of how many arguments the procedure
actually expects. The manual also suggests aligning stack frames on
64-byte boundaries, so we do this even though it doesn't appear to be
strictly necessary.


------------------------------------------------------------------------
7  Signal Handling
------------------------------------------------------------------------

The handling of non-trappable error signals -- segmentation violation
and the like -- is subtly different on the Precision to that on other
Poplog systems. The normal strategy is that the signal handler
(_pop_errsig_handler defined in "c_core.c") updates the
instruction-pointer field of the signal context to return to the
assembly code routine __pop_errsig which in turn transfers control to
the Poplog error handler. This was tried on the Precision: it was found
necessary to assign also to the next-instruction pointer field of the
context (as on the SPARC), and to clear the nullify bit in the saved
processor status word to ensure that the next instruction was genuinely
executed. The result worked most of the time, but not when the interrupt
arrived during a system call -- e.g. a QUIT signal generated from the
keyboard during a read -- when the call would appear to be restarted
rather than aborted. This is presumably because the interrupted system
call was returning to a previous state in which the changes made to the
signal context were lost.

The solution is to have _pop_errsig_handler call __pop_errsig directly
to abort the signal handling as well as everything else. In order to
prevent Poplog's stack unwinding mechanism being confused by non-pop
stack frames allocated by the signal handler, __pop_errsig takes an
argument which is the value of the stack pointer extracted from the
signal context: its first action is to reset the stack pointer to this
value, discarding the signal handler's stack frames.


------------------------------------------------------------------------
8  Documentation, Utilities etc.
------------------------------------------------------------------------

We have the following hard-copy documentation available about the
PA-RISC processor:

    PA-RISC 1.1 Architecture and Instruction Set Reference Manual
    Second Edition, Sept. 1992
    HP Part Number: 09740-90039

    Assembly Language Reference Manual
    Fourth Edition, Jan. 1991
    HP Part Number: 92432-90001

    PA-RISC Procedure Calling Conventions Reference Manual
    Second Edition, Jan. 1991
    HP Part Number: 09740-90015

The files

    /rsuna/pop/hpport/assemble.p
    /rsuna/pop/hpport/disassemble.p

contain a simple assembler and disassembler for PA-RISC code.


--- sysdoc/hppa
--- Copyright University of Sussex 1993. All rights reserved.