时间:2009-04-23 17:27
人气:
作者:admin
MAXQ2000的MAC能够发挥多大的性能?本应用笔记以一个音频滤波器为例来解释此问题,并定量给出MAXQ2000支持的性能。


bKy(k) =
aJx(j)
aJx(j)
Zeroes:
dc16
dc16 12, 11, 0x1000, 0x26d3, 0x1e42, 0xf9a3, 0xecde, 0xff31, 0xa94,
0x2ae, 0xfd0c, 0xff42, 0xde
Shift amount: 12
move MCNT, #22h ; signed, mult-accum, clear regs first
zeroes_filterloop:
move A[0], DP[0] ; let's see if we are out of data
cmp #W:rawaudiodata ; compare to the start of the audio data
lcall UROM_MOVEDP1INC ; get next filter coefficient
move MA, GR ; multiply filter coefficient...
lcall UROM_MOVEDP0DEC ; get next filter data
move MB, GR ; multiply audio sample...
jump e, zeroes_outofdata ; stop if at the start of the audio data
djnz LC[0], zeroes_filterloop
zeroes_outofdata:
move A[2], MC2 ; get MAC result HIGH
move A[1], MC1 ; get MAC result MID
move A[0], MC0 ; get MAC result LOW
执行该代码前,LC[0]装入滤波器抽头数,DP[0]装入滤波器当前输入字节地址,DP[1]装入滤波器参数起始地址。DP[1]以递增方式处理滤波器参数,DP[0]以递减方式处理输入数据(最近输入的数据首先处理)。zeroes_filterloop:
move A[0], DP[0] ; 1, let's see if we are out of data
cmp #W:rawaudiodata ; 2, compare to the start of the audio data
move DP[1], DP[1] ; 1, select DP[1] as our active pointer
move GR, @DP[1]++ ; 1, get next filter coefficient
move MA, GR ; 1, multiply filter coefficient...
move BP, BP ; 1, select BP[Offs] as our active pointer
move GR, @BP[Offs--] ; 1, get next filter data
move MB, GR ; 1, multiply audio sample...
jump e, zeroes_outofdata ; 1, stop if at the start of the audio data
djnz LC[0], zeroes_filterloop ; 1
将滤波器及输入数据装入RAM后还可以利用MAXQ体系结构的另一特点。MAXQ指令集高度不相关,在任何操作中,对采用何种源几乎没有限制。因此,可不将滤波器数据和输入数据读入GR,而是直接写入MAC寄存器。这样可使循环降至9个周期。 zeroes_filterloop:
move A[0], DP[0] ; 1, let's see if we are out of data
cmp #W:rawaudiodata ; 2, compare to the start of the audio data
move DP[1], DP[1] ; 1, select DP[1] as our active pointer
move MA, @DP[1]++ ; 1, multiply next filter coefficient
move BP, BP ; 1, select BP[Offs] as our active pointer
move MB, @BP[Offs--] ; 1, multiply next filter data
jump e, zeroes_outofdata ; 1, stop if at the start of the audio data
djnz LC[0], zeroes_filterloop ; 1
最后的修改可极大改进该代码。每次循环时,比较当前数据指针和音频输入数据起始位置,以查看是否越界(MOVE A[0], DP[0]语句,CMP比较语句以及JUMP E语句)。如果设置初始音频数据(现在正在读取的、BP[Offs]指向的环形缓冲)为全零,则可以省略这些检查。与后面的几千次采样每次节省4周期相比,RAM初始化为全零的时间可忽略,新的循环代码缩减至5个周期。 zeroes_filterloop:
move DP[1], DP[1] ; 1, select DP[1] as our active pointer
move MA, @DP[1]++ ; 1, multiply next filter coefficient
move BP, BP ; 1, select BP[Offs] as our active pointer
move MB, @BP[Offs--] ; 1, multiply next filter data
djnz LC[0], zeroes_filterloop ; 1
在回到性能方程之前,先查看一下结果计算。看起来当前并不需要移位48位结果。 move A[2], MC2 ; get MAC result HIGH
move A[1], MC1 ; get MAC result MID
move A[0], MC0 ; get MAC result LOW
move APC, #0C2h ; clear AP, roll modulo 4, auto-dec AP
shift_loop:
;
; Because we use fixed point precision, we need to shift to get a real
; sample value. This is not as efficient as it could be. If we had a
; dedicated filter, we might make use of the shift-by-2 and shift-by-4
; instructions available on MAXQ.
;
move AP, #2 ; select HIGH MAC result
move c, #0 ; clear carry
rrc ; shift HIGH MAC result
rrc ; shift MID MAC result
rrc ; shift LOW MAC result
djnz LC[1], shift_loop ; shift to get result in A[0]
move APC, #0 ; restore accumulator normalcy
move AP, #0 ; use accumulator 0
一个可能的方法是再次采用MAC。不采取右移12位(或0和16间的任一数值),而是向左移16减去该值的位数(如左移4位)。这会使结果处于MAC寄存器16位字的中间。注意,左移的实际结果是乘以2的若干次幂(假如开始准备右移12位时,为16)。 ;
; don't care about high word, since we shift left and take the
; middle word.
;
move A[1], MC1 ; 1, get MAC result MID
move A[0], MC0 ; 1, get MAC result LOW
move MCNT, #20h ; 1, clear the MAC, multiply mode only
move AP, #0 ; 1, use accumulator 0
and #0F000h ; 2, only want the top 4 bits
move MA, A[0] ; 1, lower word first
move MB, #10h ; 1, multiply by 2^4
move A[0], MC1R ; 1, get the high word, only lowest 4 bits significant
move MA, A[1] ; 1, now the upper word, we want lowest 12 bits
move MB, #10h ; 1, multiply by 2^4
or MC1R ; 1, combine the previous result and this one
;
; result is in A[0]
;
这将花费12个周期进行结果计算,而不是9 + (6 x S)个周期。| Filter Length (Taps) | Max Rate (Hz) |
| 50 | 68965.51724 |
| 100 | 37037.03704 |
| 150 | 25316.4557 |
| 200 | 19230.76923 |
| 250 | 15503.87597 |
| 300 | 12987.01299 |
| 350 | 11173.18436 |
move BP, BP ; select BP[Offs] as our active pointer
zeroes_filtertop:
move MA, #FILTERCOEFF_0 ; 2, multiply next filter coefficient
move MB, @BP[Offs--] ; 1, multiply next filter data
move MA, #FILTERCOEFF_1 ; 2, multiply next filter coefficient
move MB, @BP[Offs--] ; 1, multiply next filter data
move MA, #FILTERCOEFF_2 ; 2, multiply next filter coefficient
move MB, @BP[Offs--] ; 1, multiply next filter data
. . .
move MA, #FILTERCOEFF_N ; 2, multiply next filter coefficient
move MB, @BP[Offs--] ; 1, multiply next filter data
;
; filter calculation complete
;
为计算这种改动的性能优点,再次假设开销为40周期,但是现在每循环迭代为3周期,但实际上消除了循环。这样100抽头滤波器最大可处理58kHz (参见表2)。| Filter Length (Taps) | Max Rate (Hz) |
| 50 | 105263.1579 |
| 100 | 58823.52941 |
| 150 | 40816.32653 |
| 200 | 31250 |
| 250 | 25316.4557 |
| 300 | 31250 |
| 350 | 27027.02703 |