Graphics Reference
In-Depth Information
Fig. 11.8 Access conflict for
reference memory banking
Bank
#A
#B
#C
#D
#E
#F
#G
#H
1
A1
B1
C1
D1
E1
F1
G1
H1
2
A2
B2
C2
D2
E2
F2
G2
H2
3
A3
B3
C3
D3
E3
F3
G3
H3
4
A4
B4
C4
D4
E4
F4
G4
H4
5
A5
B5
C5
D5
E5
F5
G5
H5
6
A6
B6
C6
D6
E6
F6
G6
H6
7
A7
B7
C7
D7
E7
F7
G7
H7
8
A8
B8
C8
D8
E8
F8
G8
H8
AMVP Candidates
(64x64 PU)
ME Ref. Prefetch
Ref.
L3
SRAM
Ref. L2
SRAM
Ref. L2 SRAM
(Subsampled Pattern)
FME REF Broadcast
Ref. L1 SRAM
(Subsampled Pattern)
L1 SW SRAM x2
L1 SW SRAM x2
L1 SW SRAM x2
L1 SW SRAM x2
L1 SW SRAM x2
L1 SW SRAM x2
Ref. L1 SRAM x4
Ref. L1 SRAM x4
Ref. L1 SRAM x4
FME
(64X64 CU)
FME
(32X32 CU)
FME
(16X16 CU)
IME
Fig. 11.9
Reference memory hierarchy subsystem architecture
be divided into two parts—{D5,E5}and{D6,E6}. As a result, SRAM output
bandwidth is lowered and causes IME/FME engine to be stalled until all the data
is retrieved, causing performance loss. If this does not happen frequently, it is
acceptable. However, the motion vectors are dependent on the input sequence,
and bank conflicts may happen a lot. For example, IME/FME may need to
read out {C2,D2,E2,F2}, {B3,C3,D3,E3}, {D3,E3,F3,G3}, {B4,C4,D4,E4},
{A5,B5,C5,D5}, {B5,C5,D5,E5}, {D6,E6,F6,G6} at the same time due to
concurrent operation of multiple parallel engines. Serious bank conflicts will occur
in this case. To make matters worse, the conflict pattern may differ at different times
according to the distribution of motion vectors. It is hard to find a specific pattern
with good performance. Thus, SRAM banking is still not helpful to this issue.
Therefore, a new reference memory strategy that may provide multi-port access
and level D like data reuse is to be designed. The multi-level reference memory
hierarchy subsystem is presented in Fig. 11.9 to fulfill the requirement of external
 
Search WWH ::




Custom Search