anuraagw.me

rdna docs

rdna documentation

This was taken from the RDNA3.5 ISA pdf.

states we need to keep track of

per program

  • Global memory (allocate 512mb box to start)
  • Program counter (PC) points to first instruction when wave is created
StateDescriptionWidth / Range
TBAtrap base address48-bit
TMATrap memory address48-bit

a note about branch jumps

  • branches jump to pc_of_the_instruction_after_the_branch + offset*4
  • get_pc and swap_pc are relative to the next instruction, not the current one.
  • all prior instructions have been issued but may or may not have completed execution

state per wave

StateDescriptionWidth / Range
SGPRsscalar general purpose registerss0–s105
VGPRsvector general purpose registersv0–v255 (32-bit)
LDSdo we need to emulate cache? scratch ram
EXECtop half not used in wave3264-bit
EXECZexec is zero1-bit
VCCvector condition code64-bit
VCCZvcc is zero1-bit
SCCscalar condition code1-bit
Flat_scratchbase address for scratch memory used this wave (overflow registers)48-bit
M0misc reg32-bit
TRAPSTStrap status32-bit
TTMP0-TTMP15trap temporary SGPRs32-bit
VMcntvmem load and sample instructions issued but not yet completed6-bit
VScntvmem store instructions…6-bit
EXPcntexport/gds instructions (do we need this)3-bit
LGKMcntlds, gds, constant and message count6-bit

PC Program counter: Next shader instruction to execute. Read/write only via scalar control flow instructions and indirectly using branch. 2 LSBs are forced to zero. (what does that mean?)

EXECute Mask

Controls which threads in the vector are executed. 1=execute, 0=do not execute. Exec can be read/written via scalar instructions. Can be written as a result of vector-alu compare.

Exec affects: vector-alu, vector-memory, LDS, GDS, and export instructions. No effect on scalar execution / branches.

Wave64 uses all 64 bits, wave32 only uses 31:0.

Instruction skipping (exec=0): todo: this makes no sense right now

SGPRs

106 normal SGPRs. vcc_hi and vcc_low are technically stored in SGPR 106 and 107.

Alignment for SGPRs:

  • any time 64-bit data is used
  • scalar memory reads when the address-base comes from an SGPR pair (loading in arguments, i guess)

Other notes:

  • Writes to an out-of-range SGPR are ignored

VCC Vector condition code written by V_CMP and integer vector add/sub instructions. vcc is read by many instructions. named SGPR pair, subject to same dependency checks (?) as toher SGPRs.

VGPRs Can be modeled by a 32-long array of 32-bit values.

data types

  • b32 (binary untyped 32-bit), this is not really used
  • b64
  • f16
  • f32
  • f64.
  • bf16
  • i8
  • i16
  • i32
  • i64
  • u16
  • u32
  • u64

what changes in wave64 vs wave32

  • Every thread gets u
  • exec uses the entire 64 bits
  • vcc uses the entire 64 bits