The 8088 has a hardware multiplier, but it’s quite slow:
MUL with byte arguments: 69 cycles plus 1 for each set bit in AL plus 1 if the high byte of the result is 0
MUL with word arguments: 123 cycles plus 1 for each set bit in AX plus 1 if the high word of the result is 0
Signed multiplies are even slower (taking at least 80 cycles for a byte, 134 cycles for a word), and depend on the number of set bits in the absolute value of the accumulator, the signs of the operands and whether or not the explicit operand is -0x80 (-0x8000 for word multiplies). I also measured some word IMULs apparently taking a half-integer number of cycles to run, suggesting that there’s either some very weird behavior going on with the 8088’s multiplier or that there’s a bug in my timing program (possibly both).
Can we beat the 8088’s hardware multiplier with a software routine?