diff options
Diffstat (limited to 'docs/devel/decodetree.rst')
-rw-r--r-- | docs/devel/decodetree.rst | 237 |
1 files changed, 237 insertions, 0 deletions
diff --git a/docs/devel/decodetree.rst b/docs/devel/decodetree.rst new file mode 100644 index 000000000..49ea50c2a --- /dev/null +++ b/docs/devel/decodetree.rst @@ -0,0 +1,237 @@ +======================== +Decodetree Specification +======================== + +A *decodetree* is built from instruction *patterns*. A pattern may +represent a single architectural instruction or a group of same, depending +on what is convenient for further processing. + +Each pattern has both *fixedbits* and *fixedmask*, the combination of which +describes the condition under which the pattern is matched:: + + (insn & fixedmask) == fixedbits + +Each pattern may have *fields*, which are extracted from the insn and +passed along to the translator. Examples of such are registers, +immediates, and sub-opcodes. + +In support of patterns, one may declare *fields*, *argument sets*, and +*formats*, each of which may be re-used to simplify further definitions. + +Fields +====== + +Syntax:: + + field_def := '%' identifier ( unnamed_field )* ( !function=identifier )? + unnamed_field := number ':' ( 's' ) number + +For *unnamed_field*, the first number is the least-significant bit position +of the field and the second number is the length of the field. If the 's' is +present, the field is considered signed. If multiple ``unnamed_fields`` are +present, they are concatenated. In this way one can define disjoint fields. + +If ``!function`` is specified, the concatenated result is passed through the +named function, taking and returning an integral value. + +One may use ``!function`` with zero ``unnamed_fields``. This case is called +a *parameter*, and the named function is only passed the ``DisasContext`` +and returns an integral value extracted from there. + +A field with no ``unnamed_fields`` and no ``!function`` is in error. + +Field examples: + ++---------------------------+---------------------------------------------+ +| Input | Generated code | ++===========================+=============================================+ +| %disp 0:s16 | sextract(i, 0, 16) | ++---------------------------+---------------------------------------------+ +| %imm9 16:6 10:3 | extract(i, 16, 6) << 3 | extract(i, 10, 3) | ++---------------------------+---------------------------------------------+ +| %disp12 0:s1 1:1 2:10 | sextract(i, 0, 1) << 11 | | +| | extract(i, 1, 1) << 10 | | +| | extract(i, 2, 10) | ++---------------------------+---------------------------------------------+ +| %shimm8 5:s8 13:1 | expand_shimm8(sextract(i, 5, 8) << 1 | | +| !function=expand_shimm8 | extract(i, 13, 1)) | ++---------------------------+---------------------------------------------+ + +Argument Sets +============= + +Syntax:: + + args_def := '&' identifier ( args_elt )+ ( !extern )? + args_elt := identifier (':' identifier)? + +Each *args_elt* defines an argument within the argument set. +If the form of the *args_elt* contains a colon, the first +identifier is the argument name and the second identifier is +the argument type. If the colon is missing, the argument +type will be ``int``. + +Each argument set will be rendered as a C structure "arg_$name" +with each of the fields being one of the member arguments. + +If ``!extern`` is specified, the backing structure is assumed +to have been already declared, typically via a second decoder. + +Argument sets are useful when one wants to define helper functions +for the translator functions that can perform operations on a common +set of arguments. This can ensure, for instance, that the ``AND`` +pattern and the ``OR`` pattern put their operands into the same named +structure, so that a common ``gen_logic_insn`` may be able to handle +the operations common between the two. + +Argument set examples:: + + ®3 ra rb rc + &loadstore reg base offset + &longldst reg base offset:int64_t + + +Formats +======= + +Syntax:: + + fmt_def := '@' identifier ( fmt_elt )+ + fmt_elt := fixedbit_elt | field_elt | field_ref | args_ref + fixedbit_elt := [01.-]+ + field_elt := identifier ':' 's'? number + field_ref := '%' identifier | identifier '=' '%' identifier + args_ref := '&' identifier + +Defining a format is a handy way to avoid replicating groups of fields +across many instruction patterns. + +A *fixedbit_elt* describes a contiguous sequence of bits that must +be 1, 0, or don't care. The difference between '.' and '-' +is that '.' means that the bit will be covered with a field or a +final 0 or 1 from the pattern, and '-' means that the bit is really +ignored by the cpu and will not be specified. + +A *field_elt* describes a simple field only given a width; the position of +the field is implied by its position with respect to other *fixedbit_elt* +and *field_elt*. + +If any *fixedbit_elt* or *field_elt* appear, then all bits must be defined. +Padding with a *fixedbit_elt* of all '.' is an easy way to accomplish that. + +A *field_ref* incorporates a field by reference. This is the only way to +add a complex field to a format. A field may be renamed in the process +via assignment to another identifier. This is intended to allow the +same argument set be used with disjoint named fields. + +A single *args_ref* may specify an argument set to use for the format. +The set of fields in the format must be a subset of the arguments in +the argument set. If an argument set is not specified, one will be +inferred from the set of fields. + +It is recommended, but not required, that all *field_ref* and *args_ref* +appear at the end of the line, not interleaving with *fixedbit_elf* or +*field_elt*. + +Format examples:: + + @opr ...... ra:5 rb:5 ... 0 ....... rc:5 + @opi ...... ra:5 lit:8 1 ....... rc:5 + +Patterns +======== + +Syntax:: + + pat_def := identifier ( pat_elt )+ + pat_elt := fixedbit_elt | field_elt | field_ref | args_ref | fmt_ref | const_elt + fmt_ref := '@' identifier + const_elt := identifier '=' number + +The *fixedbit_elt* and *field_elt* specifiers are unchanged from formats. +A pattern that does not specify a named format will have one inferred +from a referenced argument set (if present) and the set of fields. + +A *const_elt* allows a argument to be set to a constant value. This may +come in handy when fields overlap between patterns and one has to +include the values in the *fixedbit_elt* instead. + +The decoder will call a translator function for each pattern matched. + +Pattern examples:: + + addl_r 010000 ..... ..... .... 0000000 ..... @opr + addl_i 010000 ..... ..... .... 0000000 ..... @opi + +which will, in part, invoke:: + + trans_addl_r(ctx, &arg_opr, insn) + +and:: + + trans_addl_i(ctx, &arg_opi, insn) + +Pattern Groups +============== + +Syntax:: + + group := overlap_group | no_overlap_group + overlap_group := '{' ( pat_def | group )+ '}' + no_overlap_group := '[' ( pat_def | group )+ ']' + +A *group* begins with a lone open-brace or open-bracket, with all +subsequent lines indented two spaces, and ending with a lone +close-brace or close-bracket. Groups may be nested, increasing the +required indentation of the lines within the nested group to two +spaces per nesting level. + +Patterns within overlap groups are allowed to overlap. Conflicts are +resolved by selecting the patterns in order. If all of the fixedbits +for a pattern match, its translate function will be called. If the +translate function returns false, then subsequent patterns within the +group will be matched. + +Patterns within no-overlap groups are not allowed to overlap, just +the same as ungrouped patterns. Thus no-overlap groups are intended +to be nested inside overlap groups. + +The following example from PA-RISC shows specialization of the *or* +instruction:: + + { + { + nop 000010 ----- ----- 0000 001001 0 00000 + copy 000010 00000 r1:5 0000 001001 0 rt:5 + } + or 000010 rt2:5 r1:5 cf:4 001001 0 rt:5 + } + +When the *cf* field is zero, the instruction has no side effects, +and may be specialized. When the *rt* field is zero, the output +is discarded and so the instruction has no effect. When the *rt2* +field is zero, the operation is ``reg[r1] | 0`` and so encodes +the canonical register copy operation. + +The output from the generator might look like:: + + switch (insn & 0xfc000fe0) { + case 0x08000240: + /* 000010.. ........ ....0010 010..... */ + if ((insn & 0x0000f000) == 0x00000000) { + /* 000010.. ........ 00000010 010..... */ + if ((insn & 0x0000001f) == 0x00000000) { + /* 000010.. ........ 00000010 01000000 */ + extract_decode_Fmt_0(&u.f_decode0, insn); + if (trans_nop(ctx, &u.f_decode0)) return true; + } + if ((insn & 0x03e00000) == 0x00000000) { + /* 00001000 000..... 00000010 010..... */ + extract_decode_Fmt_1(&u.f_decode1, insn); + if (trans_copy(ctx, &u.f_decode1)) return true; + } + } + extract_decode_Fmt_2(&u.f_decode2, insn); + if (trans_or(ctx, &u.f_decode2)) return true; + return false; + } |