'분류 전체보기' 카테고리의 글 목록 (2 Page)

'Category'에 해당되는 글 88건

2013.09.08 AMBA AXI PROTOCOL v1.0 - ADDITIONAL CONTROL INFORMATION 4
2013.09.05 NVM Express 1.1 - INSTRODUCTION 1
2012.06.16 I dreamed of the woman whom I had a crush on 2
2012.06.14 AMBA AXI PROTOCOL v1.0 - ADDRESSING OPTIONS I 9
2012.06.01 AMBA AXI PROTOCOL v1.0 - CHANNEL HANDSHAKE II 2
2012.05.24 AMBA AXI PROTOCOL v1.0 - CHANNEL HANDSHAKE I
2012.05.08 AMBA AXI PROTOCOL v1.0 - BASIC TRANSACTIONS II
2012.04.16 AMBA AXI PROTOCOL v1.0 - BASIC TRANSACTIONS I 3
2012.03.26 AMBA AXI PROTOCOL v1.0 - ARCHITECTURE OVERVIEW II
2012.03.23 AMBA AXI PROTOCOL v1.0 - ARCHITECTURE OVERVIEW I 1

2013. 9. 8. 02:24

AMBA AXI PROTOCOL v1.0 - ADDITIONAL CONTROL INFORMATION

2013. 9. 8. 02:24 in ARM Architecture

AXI가 지원하는 SYSTEM LEVEL의 CACHE와 PROTECTION UNIT에 대해 알아 봅시다.

Support for system level caches and other performance enhancing components is provided by the use of the cache information signals, ARCACHE and AWCACHE. These signals provide additional information about how the transaction can be processed.

The ARCACHE[3:0] or AWCACHE[3:0] signal supports system-level caches by
providing the bufferable, cacheable, and allocate attributes of the transaction:

AXI에서 인터커넥트 기반에서 마스터와 슬레이브간의 TRANSACTION 퍼포먼스를 개선하기 위해 몇 가지 CACHE SKIM을 지원합니다. 이것은 CACHE INFORMATION SIGNAL(ARCACHE와 AWCACHE)을 통해서 TRANSACTION을 처리하는 방법에 대해 추가적인 정보를 제공합니다. CACHE INFORMATION SIGNAL이 포함하고 있는 TRANSACTION의 속성은 BUFFERABLE, CACHEABLE 그리고 ALLOCATE가 있습니다.

1) Bufferable (B) bit, ARCACHE[0] and AWCACHE[0]
When this bit is HIGH, it means that the interconnect or any component can delay the transaction reaching its final destination for an arbitrary number of cycles. This is usually only relevant to writes.

문서에서 보면 "final destination"이라는 표현은 그 대상이 마스터나 슬레이브일 수도 있으며 혹은 IP와

INTERCONNECT 사이에 있는 또다른 Component를 의미합니다. 스펙에서 보면 Bufferable 경우에는 오로지 WRITE 동작에만 관련있다고 언급하니 어떤 IP가 요청한 WRITE TRANSACTION을 최종적으로 수신하는 IP(마스터 or 슬레이브)로 해석하시면 됩니다.

해당 비트가 HIGH일 때, TRANSACTION의 타깃이 되는 어떤 IP에서 임의의 CYCLE동안 해당TRANSACTION이 도달하는 것을 지연할 수 있다고 설명되어 있습니다. 아마도 이 설명으로 어떤 케이스가 있는지 잘 이해가 안되실 수도 있습니다만, 해당 케이스는 제가 예전에 포스팅한 [ARCHITECTURE OVERVIEW II]에서 언급된 REGISTER SLICE를 사용하는 케이스로 보시면 됩니다. 아래 그림으로 한가지 가정으로 필요성을 간단히 설명드리겠습니다.

마스터1이 슬레이브쪽으로 WRITE TRANSACTION을 전송하려고 하나, 현재 슬레이브는 마스터2와 TRANSACTION을 R/W하는 관계로 BUSY 상태에 놓여 있어서 WRITE TRANSACTION을 할 수 없습니다. 그 후 슬레이브는 다시 IDLE 상태가 되어 마스터1의 WRITE TRANSACTION을 받을 수 있는 상태가 되었습니다. 하지만 이번에는 INTERCONNECT가 다른 여러 IP들로부터 많은 TRANSACTION REQUEST에 의해 DECODING TIME이 길어지고 결과적으로 마스터1로부터 WRITE TRANSACTION을 처리하는 LATENCY 지연이 발생할 수 있습니다. 이 때문에 SLAVE와 INTERCONNECT 사이에 MASTER TRANSACTION을 임시로 저장할 REGISTER SLICE를 버퍼로 두어, SLAVE가 다른 마스터의 통신하느라 BUSY 상태라고 하더라도 버퍼에 미리 저장하여 슬레이브가 IDLE 상태가 될 때까지 WRITE TRANSACTION을 지연하고 있습니다. IDLE 상태가 되면 슬레이브는 INTERCONNECT DECODING 상태와 상관없이 MASTER REGISTER SLICE를 통해서 바로 WRITE TRANSACTION을 처리할 수가 있습니다. 장황하게 설명하였는데 그냥 말 그대로 처리할 작업을 미리 더블 버퍼링하는 개념이 되겠지요.

2) Cacheable (C) bit, ARCACHE[1] and AWCACHE[1]
When this bit is HIGH, it means that the transaction at the final destination does not have to match the characteristics of the original transaction.
For writes this means that a number of different writes can be merged together.
For reads this means that a location can be pre-fetched or can be fetched just once for multiple read transactions.
To determine if a transaction should be cached this bit should be used in conjunction with the Read Allocate (RA) and Write Allocate (WA) bits.

해당 비트가 HIGH일 때, 최종 타깃 IP의 TRANSACTION이 원래 TRANSACTION과 일치하지 않아야 한다는 것을 명시하고 있습니다. 이것이 무엇을 의미하는지 잠시 후 다시 설명드리도록 하고 우선 해당 비트가 HIGH 아닌 경우를 먼저 살펴 보도록 하겠습니다. 아래 그림처럼 마스터와 슬레이브 사이에 하나의 INTERCONNECT가 존재하고 마스터와 INTERCONNECT간에는 32비트 버스 라인을, INTERCONNECT와 슬레이브간에는 64비트 버스 라인으로 구성된 버스 아키텍처가 존재한다고 가정하겠습니다.

여기서 만약 마스터가 슬레이브쪽으로 단지 32비트의 WRITE TRANSACTION을 두 번 이슈하는 상황이라면 총 4번의 Latency가 보입니다. 마스터에서 INTERCONNECT, 그리고 INTERCONNECT에서 슬레이브쪽으로 말이죠. 마스터가 보낸 WRITE TRANSACTION이 INTERCONNECT를 거치고 여기서 Final destination인 슬레이브쪽으로 바로 전달되는데 즉, 이것은 마스터가 보낸 WRITE TRANSACTION이 슬레이브 입력으로 항상 일치하게 됩니다.

그럼 이번에는 다시 Cacheable 비트를 HIGH 세팅한 경우를 살펴 보도록 합시다.
WRITE하는 경우의 의미를 보면, 다수의 다른 WRITE TRANSACTION를 하나의 TRANSACTION로 합칠 수 있다고 설명하고 있는데요. 마스터가 INTERCONNECT쪽의 버퍼(REGISTER SLICE)에 첫 번째 32BIT WRITE TRANSACTION을 이슈합니다. 그리고 두 번째 32BIT WRITE TRANSACTION을 이슈합니다. 그러면 버퍼에 64BIT WRITE TRANSACTION으로 합쳐집니다. 그러고나서 슬레이브쪽으로 한 번에 64BIT WRITE TRANSACTION을 이슈함으로써 총 3번의 Latency가 보입니다. 결론적으로 1번 줄어들게 되는 상황이 됩니다. READ의 경우도 TRANSACTION 방향만 달라질 뿐이고 역시 마찬가지겠지요.

오늘은 시간이 늦었으니 여기서 이만 줄이겠습니다. 나머지 내용들은 나중에 다시 설명드리도록 하겠습니다. 즐거운 주말 보내세요.

Written by Simhyeon, Choe

저작자표시 (새창열림)

Posted by FreeChild

2013. 9. 5. 22:57

NVM Express 1.1 - INSTRODUCTION

2013. 9. 5. 22:57 in Storage

NVM Express 1.1에 스펙 리뷰를 진행하도록 하겠습니다.

1.1 Overview
NVM Express (NVMe) is a register level interface that allows host software to communicate with a non-volatile memory subsystem. This interface is optimized for Enterprise and Client solid state drives, typically attached to the PCI Express interface.

위에서 설명된 정의처럼, NVM Express(이하 NVMe)는 Non-volatile memory subsystem 즉, SSD와 같은 저장 장치와 호스트 소프트웨어(일반적으로 디바이스 드라이버를 의미)간에 통신하는 레지스터 레벨의 인터페이스라고 정의하고 있습니다. 좀더 요약하면 호스트와 디바이스간에 통신하는 방법에 대해 기술한 스팩이라고 할 수 있습니다.

NVMe 스펙 1.0은 2011년 3월 1에 처음 릴리즈되었으며, PCI Express bus 기반의 SSD를 타깃으로 하고 있습니다. SSD는 PCI Express bus에 이미 개발되었지만, 시장에서는 여전히 비표준 스펙의 인터페이스를 사용중이었습니다. 그래서 SSD 인터페이스의 표준화를 위해, 모든 제조사의 SSD와 통신할 수 있는 호스트 디바이스 드라이버가 필요하였지요. 또한 과거도 그렇고 현재까지 대부분 SSD는 SATA와 같은 버스 인터페이스를 사용하고 있습니다. SATA는 퍼스널 컴퓨터에서 SSD를 연결하는 가장 일반적인 방법이지만, 당시 SATA는 SSD와 비교해 상대적으로 저속의 Hard Disk를 위해 디자인되었습니다. SSD의 성능은 점점 개선되고 현재에 와서 병렬적으로 더 많은 처리량을 필요로 하고 있지만 근본적으로 SATA가 가진 인터페이스 디자인의 한계로 인해, 최대 처리량이 제한되는 한계점에 도달하였습니다.

이것이 바로 PCIe bus를 기반으로 하는 NVMe가 출현하게 된 배경이라고 할 수 있습니다.

1.4 Theory of Operation
NVM Express is a scalable host controller interface designed to address the needs of Enterprise and Client systems that utilize PCI Express based solid state drives. The interface provides optimized command submission and completion paths. It includes support for parallel operation by supporting up to 64K I/O Queues with up to 64K commands per I/O Queue. Additionally, support has been added for many Enterprise capabilities like end-to-end data protection (compatible with T10 DIF and SNIA DIX standards), enhanced error reporting, and virtualization.

NVMe의 두드러지는 특징 중 하나는 커맨드를 처리하기 위한 SUBMISSION QUEUE와 처리가 완료된 커맨드 정보를 저장하기 위한 COMPLEITION QUEUE가지고 MULTIPLE OUTSTANDING 방식으로 커맨드를 보내고 완료합니다. (Multiple Outstanding : 커맨드가 처리될 때까지 기다리지 않고 계속적으로 커맨드를 QUEUE에 넣는 방식)

그리고 각 큐의 최대 엔트리는 64K개까지 가질 수 있으며, 큐 역시 Submission, Completion 각각 64K개를 생성할 수 있습니다. 요약해서 대량의 커맨드를 보내고 처리할 수 있는 구조를 가졌지요.

아래의 NVMe의 핵심적인 속성을 설명하기 앞서, 기존의 인터페이스 아키텍쳐인 SATA 기반의 AHCI와비교해서 설명하도록 하겠습니다. 항상 새로운 아키텍쳐가 나오면 기존의 아키텍쳐와 비교해 어떤 장점이 있는지 살펴보는 자세는 매우 중요합니다. 그래야 최신 아키텍쳐를 보다 더 잘 이해할 수가 있거든요.

(테이블 출처 : http://www.extremetech.com/computing/161735-samsung-debuts-monster-1-6tb-ssd-with-new-high-speed-pcie-nvme-protocol)

1) Un-cacheable register access

- 6 times or 9 times per command on AHCI vs. 2 times per command on NVMe

   AHCI의 경우, HBA(Host Bus Adapter)를 중심으로 Host와 Device간의 인터페이스를 하도록 디자인되어 있습니다. HBA는 PxCI, PxSA, PxCLB, PxCLB와 같이 커맨드를 처리하는 과정중에 순차적으로 확인하고 세팅하는 레지스터를 가지고 있습니다. 그래서 AHCI에서는 커맨드 하나를 이슈하고 완료하기위해 non-queued command는 6번, queued command는 9번의 HBA 레지스터 접근이 필요합니다. 만약 Command List에서 최초 32개 command가 모두 queued command라고 가정하면, 커맨드를 이슈하고 처리하는 데에 32 * 9의 오버헤드가 보여질 수 있다는 것을 의미합니다. 하지만 HDD의 속도는 매우 느리기 때문에 최초 32개의 command를 모두 큐잉하는 과정 중에 잠시 보였다가 이후부터는 HDD Seek Time Latency로 가려질 것입니다. 더구나 Sequential Read/Write에서는 위에서 언급한 오버헤드는 의미가 없다고 봐도 무방하지요. 다시 요약하면 느린 HDD의 성능을 고려하면 위의 오버헤드는 전혀 고려 대상이 아니었을 겁니다. HDD와 달리, NVMe는 고성능 Server향 SSD를 타깃으로 한 스펙입니다. Server향 SSD의 핵심적인 키워드는 응답 속도와 Random Read/Write의 성능인데, 기존의 AHCI와 같은 아키텍쳐로 충분한 Throuhput을 가질 수 있을까요? NVMe는 하나의 Queue가 가질 수 있는 최대 엔트리 갯수는 64K개, 그리고 그런 사이즈의 Queue를 64K개만큼 스펙상 이론적으로 생성 가능합니다. 위에서 언급한 레지스터 접근에 대해 다시 한 번 더 상기하면, AHCI는 커맨드 하나를 이슈하고 처리하는 데에 9번의 레지스터 접근(NCQ Command의 경우)을 해야한다고 언급하였습니다. 그에 반해 NVMe는 단 2번의 레지스터 접근으로 이슈하고 처리할 수 있지요. 즉, AHCI가 Command를 1번 이슈하고 처리할 동안 NVMe는 4번할 수 있다는 의미가 됩니다. 만약에 HBA를 기반으로 한 AHCI 아키텍쳐로 대량의 64K개의 큐 엔트리를 가지는 64K개의 큐에 커맨드를 이슈하고 처리한다면 64K * 64K * 4의 엄청난 버든이 발생할 수 밖에 없습니다. 그리고 클라이언트로부터 Random Read/Write가 아주 빈번하게 발생하는 상황이라면 매번 연산할 때마다 레이턴시가 보이기 때문에 고성능 SSD임에도 불구하고 처리량은 매우 제한적인 결과를 초래할 수 밖에 없게 되겠지요. 그래서 NVMe에서는 복잡한 레지스터 접근을 최소화하기 위해 HBA를 사용하지 않고 호스트 컨트롤러와 디바이스 컨트롤러가 직접 인터페이스하도록 설계를 하였습니다.

2) Command Queue

- One command list and 32 commands per queue on AHCI vs.

   64K Cmd / 64K Queues on NVMe

NVMe에서는 이슈할 커맨드를 저장하는 SQ(Submission Queue)와 완료한 커맨드 정보를 저장하는 CQ(Completion Queue)로 나누어서 Command 처리를 합니다. 여기서 의문을 가져야 할 점이 있습니다. AHCI처럼 하나의 Command List로 커맨드를 이슈하고 완료하는 것을 각 32개의 Command Slot 마다 PxCI 레지스터로 확인할 수 있는데 굳이 SQ와 CQ로 나누어서 커맨드를 처리하려는 이유에 대해서 말입니다.

[AHCI Command List에서 command 처리/완료 확인 방식]

(위의 그림은 단지 AHCI Command List의 Command 처리/완료를 단일 큐에서 이루어지는 것을 설명하기 위해 간단히 도식화한 점을 유의하시기 바랍니다)

이유는 간단합니다. AHCI에서는 Command를 전달하고 처리할 때 HBA의 DMA가 그 역할을 담당하지만 NVMe는 디바이스쪽의 컨트롤러가 호스트의 Command를 패치해 옵니다. 이렇게 전혀 다른 방식으로 구동하는데 바로 HBA의 DMA가 버든이 적지 않을 거라는 예상이 가능합니다. 매번 32개의 Command Slot을 폴링 방식으로 확인하고 커맨드를 처리해야 할 의무가 있기 때문입니다. 만약에 Queue Entry가 64K개이고 Queue의 갯수가 64K개라는 Worst Case를 가정하면 엄청난 버든이 될 수 밖에 없겠지요. 따라서 NVMe에서는 처리할 커맨드를 SQ에다가 올려두면 디바이스쪽의 컨트롤러가 이를 패치하고 처리한 후에 완료했다는 정보만 호스트에게 알려주는 쉬운 자료구조로 디자인하였습니다.

The interface has the following key attributes:
 Does not require uncacheable / MMIO register reads in the command submission or

   completion path.
 A maximum of one MMIO register write is necessary in the command submission path.
 Support for up to 64K I/O queues, with each I/O queue supporting up to 64K commands.
 Priority associated with each I/O queue with well-defined arbitration mechanism.
 All information to complete a 4KB read request is included in the 64B command itself,

    ensuring efficient small I/O operation.
 Efficient and streamlined command set.
 Support for MSI/MSI-X and interrupt aggregation.
 Support for multiple namespaces.
 Efficient support for I/O virtualization architectures like SR-IOV.
 Robust error reporting and management capabilities.
 Support for multi-path I/O and namespace sharing.

일단 오늘은 여기까지 마무리하고 내일 다시 AHCI와 NVMe를 계속 비교하여 설명드리도록 하겠습니다.

Written by Simhyeon, Choe

저작자표시 (새창열림)

Posted by FreeChild

2012. 6. 16. 00:23

I dreamed of the woman whom I had a crush on

2012. 6. 16. 00:23 in I-Diary

I dreamed last night that I met the woman whom I had a crush on six years ago.

I remember scenes from the dream distinctly. In my dream, I met her in front of her house.

I continually tried to go sweethearting her, but she coldly rejected me. She has already lived

with a man in her house. The moment I saw the situation made me very sad though I just

dreamed. I woke up from my dream soon. After awaking up immdiately, I was feeling down

becuase of the dream. I've seen her twice in Dongdaegu station and downtown in Daegu

a long time ago. Since then I have never seen her again. Actually, I have not still forgotten

her. I can't stop thinking about her. I just think time heals all wounds.

Anyway, Have a good weekend everyone.

Written by Simhyeon, Choe

저작자표시 (새창열림)

Posted by FreeChild

2012. 6. 14. 00:49

AMBA AXI PROTOCOL v1.0 - ADDRESSING OPTIONS I

2012. 6. 14. 00:49 in ARM Architecture

오늘은 ADDRESSING OPTIONS에 관한 부분들에 대해서 살펴 보도록 하겠습니다.

시간이 많이 늦은 관계로, BURST LENGTH와 BURST SIZE에 대해서만 언급해 드리도록 하겠습니다.

4.1 About addressing options

The AXI protocol is burst-based, and the master begins each burst by driving transfer
control information and the address of the first byte in the transfer. As the burst
transaction progresses, it is the responsibility of the slave to calculate the addresses of
subsequent transfers in the burst. Bursts must not cross 4KB boundaries to prevent them from crossing boundaries between slaves and to limit the size of the address incrementer required within slaves.

위의 설명에 4K BOUNDARY에 대해, 초과하면 안된다고 언급되어 있는데, 이것은 이 장 마지막에서 다시 언급해 드리도록 하겠습니다.

4.2 Burst length

The AWLEN or ARLEN signal specifies the number of data transfers that occur within
each burst. As Table 4-1 shows, each burst can be 1-16 transfers long.

BURST LENGTH는 하나의 BURST를 구성하는 일련의 DATA TRANSFER의 수를 의미합니다.
(AMBA AXI4 최신 스펙에서 용어 설명에 보면 한 BEAT가 하나의 BURST에서 각각의 DATA

TRANSFER라고 정의하고 있습니다. A BEAT = A DATA TRANSFER)

For wrapping bursts, the length of the burst must be 2, 4, 8, or 16 transfers. Every transaction must have the number of transfers specified by ARLEN or AWLEN. No component can terminate a burst early to reduce the number of data transfers. During a write burst, the master can disable further writing by deasserting all the write strobes, but it must complete the remaining transfers in the burst. During a read burst, the master can discard further read data, but it must complete the remaining transfers in the burst.

WRAPPING BURST에서 BURST의 LENGTH는 2, 4, 8, 16까지해서 n^2로 정렬된 크기를 가져야 한다는 점이 중요합니다. 그리고 어떤 IP든 ARLEN이나 AWLEN에 명시된 TRANSFER의 LENGTH보다 작게하여 BURST TRANSACTION을 끝낼 수 없습니다.

(AWLEN = 8인데 AWLEN을 4의 LENGTH만큼만 READ/WRITE하는 상황이 불가능)
즉, 명시된 길이만큼의 BURST TRANSACTION은 완료하되, WRITE BURST에서는 마스터가 WRITE STROBE 스킴으로 특정 BYTE LANE을 선택하여 추가적인 WRITING을 막을 수 있고 READ BURST에서는 마스터가 추가적인 READ DATA를 무시할 수 있습니다. 정리하면, 한 번 시작한 트랜잭션은 동작 중에 중단할 수 없으며, 정해진 길이의 모든 트랜잭션은 완료해야 한다는 의미로 볼 수 있습니다.

Table 4-2 shows how the ARSIZE or AWSIZE signal specifies the maximum number of data bytes to transfer in each beat, or data transfer, within a burst. The AXI determines from the transfer address which byte lanes of the data bus to use for each transfer.
For incrementing or wrapping bursts with transfer sizes narrower than the data bus, data transfers are on different byte lanes for each beat of the burst. The address of a fixed burst remains constant, and every transfer uses the same byte lanes. The size of any transfer must not exceed the data bus width of the components in the transaction.

여기서 DATA TRANSFER의 크기는 BUS WIDTH와 동일하거나 그보다 작아야 합니다. 데이터 버스의 BYTE LANE이 각각의 TRANSFER로 사용하는 것을 TRANSFER ADDRESS로부터 결정해야 한다고 나와 있는데요. 간단히 예를 들어 설명하자면, AWADDR이 0x1000이라고 가정하고 4 BEAT(BURST SIZE = 1 BYTE)로 구성된 하나의 BURST를 WRITE TRANSACTION하는 상황이라고 하겠습니다. 0x1000에는 [7:0] 0x1001에는 [15:8], 0x1002에는 [23:16], 0x1003에는 [31:24]의 비트 스코프로 전송해야 한다는 의미가 됩니다.
이렇게 증가하는 것은 WRAPPING BURST와 INCREMENTING BURST MODE에서 해당하는 스킴이고 FIXED BURST MODE에서는 오로지 하나의 특정 주소 영역에 대해서만 TRANSACTION을 수행하기 때문에
모든 TRANSFER는 항상 동일한 BYTE LANE을 사용합니다. 즉, 주소값이 변하지 않는다는 것을 의미합니다.
각 BURST TYPE은 나중에 다시 설명을 드리도록 하겠습니다.

(BYTE LANE이라는 개념은 하나의 BURST에서 특정 BEAT의 단위를 의미합니다만, 그 중에서 특정 BEAT를 READ/WRITE TRANSACTION에서 선택적으로 ENABLE하거나 DISABLE하여 전송할 수 있도록 만들어 놓은 단위라고 생각하시면 됩니다. BYTE LANDE에서 "BYTE"라고 명명한 이유가 최소 8BIT 이상의 크기를 가지기 때문인 것 같습니다. 이것은 나중에 WRITE STROBE PART에서 다시 설명을 드리도록 하겠습니다.)

그럼 오늘은 여기까지만 정리하도록 하겠습니다.

12시가 넘었으니... 즐거운 하루 보내십시오. :)

Written by Simhyoen, Choe

저작자표시 (새창열림)

Posted by FreeChild

2012. 6. 1. 02:04

AMBA AXI PROTOCOL v1.0 - CHANNEL HANDSHAKE II

2012. 6. 1. 02:04 in ARM Architecture

채널 핸드쉐이크의 두 번째 내용을 추가적으로 설명해 드리겠습니다.

3.2 Relationships between the channels

The relationship between the address, read, write, and write response channels is flexible.
For example, the write data can appear at an interface before the write address that relates to it. This can occur when the write address channel contains more register stages than the write data channel. It is also possible for the write data to appear in the same cycle as the address.

이 절 첫머리를 보면 다섯 채널간에 유연성을 가지고 있다고 나와 있습니다만, 스펙 본문의 예로, WRITE DATA가 WRITE ADDRESS보다 인터페이스에 먼저 도착할 수 있다는 의미로 설명되어 있는데요. 이 경우에 WRITE ADDRESS CHANNEL에서 WRITE DATA CHANNEL보다 더 많은 REGISTER SLICE를 가지고 있을 때를 의미합니다. 그리고 ADDRESS와 같이 동일한 사이클에 WRITE DATA가 도착하는 것이 가능하다고 합니다. 물론, SLAVE에 DATA가 먼저 도착하더라도 ADDRESS와 CONTROL INFORMATION 없이 동작하는 것은 불가능합니다. 다만 여기에서 위와 같이 언급한 이유가 있는데요. 예를 들어, ADDRESS는 이슈하는 데에 1 cycle, DATA 전송하는 데에 2 cycle로 가정합시다. 채널간의 유연성이 없을 경우 항상 순서 그대로 ADDRESS, DATA가 도착해야 하니 3 cycle을 소요해야 하지만, ADDRESS와 DATA가 동시에 도착할 수 있다면 2 cycle에 완료할 수 있습니다. 이 부분 역시 성능을 고려한 스킴이라고 볼 수 있겠네요. 저도 좀 더 명확하게 알기 위해, 회사에서 AMBA를 정말 잘 알고 있는 외국 엔지니어에게 물어봤습니다 :)

When the interconnect must determine the destination address space or slave space, it must realign the address and write data. This is required to assure that the write data is signaled as valid only to the slave for which it is destined.

Two relationships that must be maintained are:
• read data must always follow the address to which the data relates
• a write response must always follow the last write transfer in the write transaction
to which the write response relates.

바로 위에서 언급된 내용은 일반적인 설명들입니다. BURST TRANSACTION이 필요할 때, MASTER가

INTERCONNECT로 ADDRESS를 이슈하고 DATA TRANSFER를 보내는데, INTERCONNECT에서는 대상 SLAVE의 ADDRESS를 결정해야 하고 해당 SLAVE의 ADDRESS의 위치에 따라 정렬해야 한다는 내용입니다.

3.3 Dependencies between channel handshake signals

To prevent a deadlock situation, you must observe the dependencies that exist between the handshake signals.
In any transaction:
• the VALID signal of one AXI component must not be dependent on the READY
signal of the other component in the transaction
• the READY signal can wait for assertion of the VALID signal.

이전 시간에 VALID와 READY 핸드쉐이크에서 VALID나 READY가 누가 먼저 ASSERT되더라도 상관없다고 언급하였는데요. 하지만 일부 VALID나 READY가 반드시 ASSERT되어야만 다음 VALID 혹은 READY를 ASSERT할 수 있는 경우가 있습니다.

아래 그럼 3-4와 3-5에서 handshake signal dependencies를 보여주고 있습니다만, 싱글 헤드 포인터는 두 시그널의 관계에서 둘 중 어떤 시그널이 먼저 ASSERT되더라도 상관없다는 것을 나타내고 더블 헤드 포인터는 지시하는 쪽이 먼저 선행되어야 지시당하는 쪽이 ASSERT될 수 있다는 의미입니다.

Figure 3-4 shows that, in a read transaction:
• the slave can wait for ARVALID to be asserted before it asserts ARREADY
• the slave must wait for both ARVALID and ARREADY to be asserted before it
starts to return read data by asserting RVALID

위의 설명대로 슬레이브는 ARREADY가 ASSERT하기 전에 ARVALID 시그널이 ASSERT하도록 기다릴 수 있고 반대로 ARVALID가 ASSERT하기 전에 ARVALID 시그널이 ASSERT하도록 기다릴 수도 있습니다.
하지만 두 번째 문장처럼 RVALID가 ASSERT하기 전에 ARVALID와 ARREADY 모두 ASSERT하도록 기다려야만 합니다.

Figure 3-5 shows that, in a write transaction:
• the master must not wait for the slave to assert AWREADY or WREADY before
asserting AWVALID or WVALID
• the slave can wait for AWVALID or WVALID, or both, before asserting AWREADY

• the slave can wait for AWVALID or WVALID, or both, before asserting WREADY
• the slave must wait for both WVALID and WREADY to be asserted before asserting BVALID.

Note
It is important that during a write transaction, a master must not wait for AWREADY
to be asserted before driving WVALID. This could cause a deadlock condition if the
slave is conversely waiting for WVALID before asserting AWREADY.

위의 화살표로 나타난 그림이 복잡해 보인다면, 아래의 그림과 함께 보면 좀 더 쉽게 보실 수 있습니다.

예를 들어, [1] Read transaction인 경우에는 Read Address 관련 시그널들은 VALID나 READY 둘 중 어느 시그널이 먼저 ASSERT가 되어도 상관 없다고 말씀드렸습니다만, 점선을 넘어가는 시점(더블 헤드 포인터)에서는 Read Address 관련 시그널들이 선행되어야만 Read Data Signal들이 ASSERT할 수 있습니다. [2] Write transaction을 보시면 마찬가지로 점선을 넘어가는 시점이 더블 헤드 포인터로 나타납니다.

[1] Read transaction [2] Write transaction

정리하면, DEADLOCK을 피하기 위해 VALID/READY 시그널이 선행적으로 ASSERT되어야 하는 경우를 알아야 합니다. 그리고 서로 ASSERT하는 시그널의 순서에 상관없는 경우임에도 불구하고 시그널간의 의존성을 가지도록 구현한다면 DEALOCK에 빠질 수 있기 때문에 이를 주의해야 합니다.

시간이 늦었으니 오늘은 여기까지만 정리하도록 하겠습니다. 좋은 하루 보내세요.

Written by Simhyeon, Choe

저작자표시 (새창열림)

Posted by FreeChild

2012. 5. 24. 01:08

AMBA AXI PROTOCOL v1.0 - CHANNEL HANDSHAKE I

2012. 5. 24. 01:08 in ARM Architecture

오늘은 CHANNEL HANDSHAKE에 대해서 공부하겠습니다.

이번 챕터에서는 핸드쉐이크의 개요와 READY/VALID 핸드쉐이크 시그널의 기본값에 대해서 알아보도록 하겠습니다.

3.1 HANDSHAKE PROCESS

지난 시간에 AMBA AXI PROTOCOL은 5개의 채널(READ ADDRESS CHANNEL, READ DATA CHANNEL, WRITE ADDRESS CHANNEL, WRITE DATA CHANNEL, WRITE RESPOSE CHANNEL)을 가지고 있다고 언급하였습니다.

5개 채널 모두 DATA나 CONTROL INFORMATION을 전송하기 위해 동일한 VALID/READY 핸드쉐이크 메커니즘을 사용합니다. 이것을 TWO-WAY 핸드쉐이크 메커니즘으로 부릅니다. TWO-WAY 핸드쉐이크 메커니즘은 마스터와 슬레이브 모두 DATA와 CONTROL INFORMATION의 전송율을 제어할 수 있도록 합니다. 스펙 원문에서보면 시그널을 전달하는 쪽을 SOURCE, 받는 쪽을 DESTINATION이라고 표현하고 있습니다만, 이것은 상황에 따라서 마스터나 슬레이브가 SOURCE가 될 수도 있고 반대로 DESTINATION이 될 수도 있습니다.
(1.3.3절 WRITE BURST EXAMPLE에서 AWVALID, AWREADY, BVALID, BREADY 시그널 참조)

SOURCE에서는 DATA나 CONTROL INFORMATION이 이용 가능할 때를 의미하는 VALID 시그널을 생성하고 DESTINATION에서는 DATA나 CONTROL INFORMATION을 수락 가능 여부를 나타내기 위한 READY 시그널을 생성합니다. 그래서 VALID와 READY 시그널이 모두 HIGH일 때에만 전송을 시작합니다. 이때 마스터와 슬레이브 인터페이스에서 INPUT과 OUTPUT 시그널 간에 조합된 경로가 없어야 합니다.

Figure 3-1 to Figure 3-3 show examples of the handshake sequence. In Figure 3-1, the source presents the data or control information and drives the VALID signal HIGH. The data or control information from the source remains stable until the destination drives the READY signal HIGH, indicating that it accepts the data or control information. The arrow shows when the transfer occurs.

In Figure 3-2, the destination drives READY HIGH before the data or control information is valid. This indicates that the destination can accept the data or control information in a single cycle as soon as it becomes valid. The arrow shows when the transfer occurs.

In Figure 3-3, both the source and destination happen to indicate in the same cycle that they can transfer the data or control information. In this case the transfer occurs immediately. The arrow shows when the transfer occurs.

위의 그림 3-1, 3-2, 3-3에서 HANDSHAKE SEQUENCE에 대한 예제를 보여주고 있는데, 간단합니다. 우선 첫 번째로 3-1을 보시면 SOURCE에서 DATA와 CONTROL INFORMATION을 보여주고 있고 VALID 시그널이 HIGH인 상황인데요. DESTINATION의 READY 시그널이 HIGH가 될 때까지 SOURCE로부터 전송하고 있는 DATA나 CONTROL INFORMATION 시그널은 계속 유지가 됩니다. 그러다가 DESTINATION에서 DATA나 CONTROL INFORMATION이 수락 가능한 상황 즉, READY가 HIGH가 될 때 다음 사이클의 RIGING EDGE에 샘플링(화살표)되고 전송이 이루어집니다. 3-2에서는 3-1과 반대로 DESTINATION에서 수락 가능한 상황이지만 SOURCE에서 준비가 되지 않은 상태를 유지하다가 마찬가지로 VALID가 HIGH되는 시점에 전송이 이루어지고, 3-3은 둘 다 지연없이 HIGH가 되어 곧 바로 전송이 되는 것을 보여주고 있습니다.

아래 내용은 각 채널에서 VALID/READY HANDSHAKE 메커니즘을 다룰 때, 준수해야 할 세부적인 규칙에 대한 설명입니다.

3.1.1 Write address channel
The master can assert the AWVALID signal only when it drives valid address and control information. It must remain asserted until the slave accepts the address and control information and asserts the associated AWREADY signal. The default value of AWREADY can be either HIGH or LOW. The recommended default value is HIGH, although if AWREADY is HIGH then the slave must be able to accept any valid address that is presented to it. A default AWREADY value of LOW is possible but not recommended, because it implies that the transfer takes at least two cycles, one to assert AWVALID and another to assert AWREADY.

각 채널마다 비슷한 내용으로 세부적인 부분을 다루고 있습니다만, WRITE ADDRESS CHANNEL 경우를 보면 마스터는 보낼 ADDRESS와 CONTROL INFORMATION이 존재할 때, AWVALID 시그널을 HIGH로 보낼 수 있고 슬레이브는 마스터가 보낸 ADDRESS와 CONTROL INFORMATION을 AWREADY 시그널이 수락할 때까지 ISSUING(계속 보내는 상태로 유지)합니다. AWREADY 시그널의 기본적인 값은 스펙 원문에 나온 것과 같이 HIGH나 LOW 값을 가질 수 있는데 스펙에서는 기본값으로 HIGH를 권장하고 있습니다. AWREADY가 HIGH라면 슬레이브는 반드시 어떤 유효한 ADDRESS를 수락할 수 있어야 합니다. 물론 AWREADY의 기본값을 LOW를 가질 수도 있지만 스펙에서는 추천하지 않습니다. LOW인 상태에서 (AWREADY가 아무리 빨리 HIGH로 트랜지션하여도) 최소 2 CYCLE 지연이 있다고 나와 있습니다.

적절한 예로 아래의 그림 3-1을 보시면 이해할 수 있습니다. READY 시그널이 LOW로 시작하고 있고 READY가 3번째 사이클 타임에서 HIGH가 되었지만 곧바로 샘플링되지 않고(X표시) 4번째 사이클에서 샘플링(O표시)되는 것을 확인할 수 있습니다. 다시 돌아와서 AWVALID 1 사이클 + AWREADY 1 사이클이 되기 때문에 최소 2 사이클 소요가 됨을 의미합니다.

나머지 채널에서 세부적으로 설명하는 부분들도 비슷하기 때문에 추가로 다루지 않겠습니다.

3.1.2 Write data channel
During a write burst, the master can assert the WVALID signal only when it drives valid write data. WVALID must remain asserted until the slave accepts the write data and asserts the WREADY signal. The default value of WREADY can be HIGH, but only if the slave can always accept write data in a single cycle. The master must assert the WLAST signal when it drives the final write transfer in the burst. When WVALID is LOW, the WSTRB[3:0] signals can take any value, although it is recommended that they are either driven LOW or held at their previous value.

3.1.3 Write response channel
The slave can assert the BVALID signal only when it drives a valid write response. BVALID must remain asserted until the master accepts the write response and asserts BREADY. The default value of BREADY can be HIGH, but only if the master can always accept a write response in a single cycle.

3.1.4 Read address channel
The master can assert the ARVALID signal only when it drives valid address and control information. It must remain asserted until the slave accepts the address and control information and asserts the associated ARREADY signal. The default value of ARREADY can be either HIGH or LOW. The recommended default value is HIGH, although if ARREADY is HIGH then the slave must be able to accept any valid address that is presented to it. A default ARREADY value of LOW is possible but not recommended, because it implies that the transfer takes at least two cycles, one to assert ARVALID and another to assert ARREADY.

3.1.5 Read data channel
The slave can assert the RVALID signal only when it drives valid read data. RVALID must remain asserted until the master accepts the data and asserts the RREADY signal. Even if a slave has only one source of read data, it must assert the RVALID signal only in response to a request for the data. The master interface uses the RREADY signal to indicate that it accepts the data. The default value of RREADY can be HIGH, but only if the master is able to accept read data immediately, whenever it performs a read transaction. The slave must assert the RLAST signal when it drives the final read transfer in the burst.

위의 스펙 원문에 ASSERT라는 동사가 자주 나오는데요. SPEC P.17에 ASSERTED 용어에 대한 설명이 있습니다만, 시그널을 Assert한다는 의미는 active-HIGH나 active-LOW 상태를 가지는 것을 의미합니다. 아래에는 위키델피아에서 발췌한 Logic level에서 Active state에 대한 일부 설명입니다.

Active-high and active-low states can be mixed at will: for example, a read only memory integrated circuit may have a chip-select signal that is active-low, but the data and address bits are conventionally active-high. Occasionally a logic design is simplified by inverting the choice of active level (see DeMorgan's theorem).

이 파트에서 궁금한 내용에 대해 질문을 주시면 아는 범위 내에서 답변을 드리도록 하겠습니다.

오늘은 여기까지 정리하도록 하겠습니다. 편안한 밤 보내세요.

Written by Simhyeon, Choe

저작자표시 (새창열림)

Posted by FreeChild

2012. 5. 8. 00:45

AMBA AXI PROTOCOL v1.0 - BASIC TRANSACTIONS II

2012. 5. 8. 00:45 in ARM Architecture

OVERLAPPING READ BURST와 WRITE BURST를 정리하도록 하겠습니다.

1.3.2 Overlapping read burst example

Figure 1-5 shows how a master can drive another burst address after the slave accepts the first address. This enables a slave to begin processing data for the second burst in parallel with the completion of the first burst.

위의 그림에서 보시는 것과 같이 마스터는 슬레이브에게 BURST ADDRESS를 ISSUE하고(CONTROL INFORMATION도 함께) 슬레이브가 그 ADDRESS를 ACCEPT한 한 직후, 마스터는 다른 BURST ADDRESS를 ISSUE합니다. 그리고 T4 사이클에서 보면 마스터가 슬레이브에게 두 번째(B) BURST에 대한 ADDRESS를 ISSUE하는 중에 슬레이브는 마스터에게 첫 번째(A) 데이터를 전송하는 것을 볼 수 있는데, 이와 같이 병렬적인 처리가 가능한 이유가 BURST TRANSACTION을 완료하기전에 다음 BURST의 ADDRESS를 ISSUE할 수 있는 MULTIPLE OUTSTANDING ADDRESS 스킴과 함께 ADDRESS CHANNEL과 DATA CHANNEL이 분리되어 있기 때문입니다. 이것은 이전 시간에도 이미 언급했던 SKIM들입니다. 그 이후에는 "READ BURST" 예제와 동일하게 두 번째 트랜잭션을 마지막 DATA TRANSFER의 전송과 함께 RLAST 시그널을 마스터에게 보내고 트랜잭션을 마무리합니다.

1.3.3 Write burst example

Figure 1-6 shows a write transaction. The process starts when the master sends an address and control information on the write address channel. The master then sends each item of write data over the write data channel. When the master sends the last data item, the WLAST signal goes HIGH. When the slave has accepted all the data items, it drives a write response back to the master to indicate that the write transaction is complete.

WRITE BURST의 경우에는 READ BURST에 비해서 조금 복잡해 보일 수도 있지만, 동작 흐름은 크게 다르지 않습니다. 마스터는 WRITE 대상의 슬레이브에게 보낼 준비가 된 CONTROL INFORMATION과 WRITE ADDRESS를 ISSUE합니다. 슬레이브는 T2 사이클에서 해당 BURST ADDRESS를 수락함과 동시에 마스터가 바로 DATA TRANSFER를 전송하게 되는데 이것은 WVALID 시그널이 HIGH 상태 즉, 보낼 DATA TRANSFER가 바로 유효한 상태로 샘플링되었기 때문에 그렇습니다. T4 사이클에서 보면 슬레이브가 마스터로부터 DATA TRANSFER를 수락할 준비가 되면 하나의 DATA TRANSFER 전송이 완료되고 이후의 흐름도 동일하게 이어지다가 T9 사이클에서 마찬가지로 마스터가 슬레이브에게 마지막 DATA TRANSFER와 마지막 DATA임을 알리는 WLAST 시그널을 전송하게 되고 슬레이브는 WRITE 트랜잭션을 정상적으로 완료했다는 BRESP 시그널을 마스터에게 보냅니다. 마스터는 BREADY 시그널로 BRESP 시그널을 수락한 준비가 된 상태(HIGH)에서 슬레이브에서 보낸 BRESP 시그널을 수신함과 동시에 BREADY 시그널은 LOW 상태로 트랜지션합니다. 이로써 하나의 WRITE TRANSACTION이 완전히 마무리됩니다.

1.3.4 Transaction ordering

AXI Protocol은 OUT-OF-ORDER TRANSACTION 완료가 가능하다고 하였습니다. 모든 트랜잭션에 ID 태그가 주어지는데, 동일한 ID 태그를 가진 트랜잭션들은 순차적으로 완료해야 하지만 다른 ID 태그를 가진 트랜잭션은 비순차적으로 완료할 수 있다는 의미입니다.

첫 번째 강좌에서 이미 언급해 드렸지만 OUT-OF-ORDER 트랜잭션은 두가지 방법을 통해 성능을 향상시킬 수 있습니다

1) The interconnect can enable transactions with fast-responding slaves to complete

in advance of earlier transactions with slower slaves.

2) Complex slaves can return read data out of order. For example, a data item for a later

access might be available from an internal buffer before the data for an earlier access is

available.

첫 번째 방법은 마스터A가 슬레이브B에게 ADDRESS ISSUE를 먼저하고 다음에 슬레이브C에게ADDRESS ISSUE를 한 상황으로 가정합시다. AHB 프로토콜의 경우에는 순차적으로 트랜잭션을 마무리할 수 있는데, 만약 슬레이브B가 슬레이브C보다 응답이 느린 디바이스라고 해도 슬레이브B의 트랜잭션 동작이 먼저 처리가 되어야만 슬레이브C의 동작을 처리할 수 있습니다. 반면에 AXI 프로토콜에서는 응답속도가 더 빠른 슬레이브의 트랜잭션을 먼저 처리할 수 있기 때문에 성능에 이점이 있습니다.

두 번째 방법은 내부 버퍼에 1, 2, 3, 4 이렇게 순서대로 데이터 쌓여 있고 1이 먼저 처리되어야 할 순서라고 가정하겠습니다. 슬레이브 입장에서는 1보다 4부터 먼저 처리하기를 원하는 경우가 있다면 그럴 경우에 4, 3, 2, 1과 같이 비순차적으로 사용 가능하기 때문에 성능적인 측면에서 더욱 좋습니다.

If a master requires that transactions are completed in the same order that they are
issued, then they must all have the same ID tag. If, however, a master does not require
in-order transaction completion, it can supply the transactions with different ID tags,
enabling them to be completed in any order.
In a multimaster system, the interconnect is responsible for appending additional
information to the ID tag to ensure that ID tags from all masters are unique. The ID tag
is similar to a master number, but with the extension that each master can implement
multiple virtual masters within the same port by supplying an ID tag to indicate the
virtual master number.
Although complex devices can make use of the out-of-order facility, simple devices are
not required to use it. Simple masters can issue every transaction with the same ID tag,
and simple slaves can respond to every transaction in order, irrespective of the ID tag.

위의 스펙을 보시면 ID 태그에 대한 많은 설명이 나오는데요. ID 태그가 동일한 트랜잭션들은 반드시 순서대로 처리가 완료되어야 하지만 다른 ID 태그를 가진 트랜잭션은 비순차적으로 완료할 수 있다는 내용이 핵심인 것만 알아두시면 됩니다. 시간이 늦어 이만 강좌를 마치도록 하겠습니다. 궁금한 내용은 블로그에 얼마든지 질문해 주시면 아는데까지 성심껏 답변을 드리겠습니다. 편안한 밤 보내세요.

Written by Simhyeon, Choe

저작자표시 (새창열림)

Posted by FreeChild

2012. 4. 16. 22:30

AMBA AXI PROTOCOL v1.0 - BASIC TRANSACTIONS I

2012. 4. 16. 22:30 in ARM Architecture

오늘은 READ BURST, OVERLAPPING READ BURST, WRITE BURST 트랜잭션의 타이밍 다이어그램을 살펴보도록 합시다.

1.3 Basic transactions
This section gives examples of basic AXI protocol transactions. Each example shows
the VALID and READY handshake mechanism. Transfer of either address information
or data occurs when both the VALID and READY signals are HIGH. The examples are
provided in:
• Read burst example
• Overlapping read burst example
• Write burst example

1.3.1 Read burst example
Figure 1-4 shows a read burst of four transfers. In this example, the master drives the
address, and the slave accepts it one cycle later.

- Note -

The master also drives a set of control signals showing the length and type of the burst, but these signals are omitted from the figure for clarity.

After the address appears on the address bus, the data transfer occurs on the read data
channel. The slave keeps the VALID signal LOW until the read data is available. For
the final data transfer of the burst, the slave asserts the RLAST signal to show that the
last data item is being transferred.

AXI BUS PROTOCOL은 AHB BUS PROTOCOL과 마찬가지로 모든 시그널은 RIGING EDGE에서 샘플링됩니다. 그리고 각 시그널에 대한 레퍼런스는 AMBA AXI PROTOCOL v1.0 SPEC P.31의 SIGNAL DESCRIPTION에 설명되어 있습니다.

위의 타이밍 다이어그램을 살펴보면, 모든 시그널의 샘플링을 전역으로 참조할 수 있도록 ACLK이 있고 녹색으로 표시된 부분은 마스터가 슬레이브에게 보내는 시그널이며 파란색으로 표시된 부분은 슬레이브가 마스터에게 보내는 시그널을 의미합니다(혼돈하기 쉽기 때문에 표시하였습니다). 그리고 각 시그널에는 AR, AW, R, W와 같이 시그널 앞에 PREFIX가 붙어 있는데요. AR은 "Read Address"를 뜻하고 AW는 "WRITE ADDRESS"를 의미합니다. 그냥 R과 W는 DATA에 대한 R/W를 말합니다. 몇 가지 예를 들어보도록 하겠습니다.

ARADDR = READ ADDRESS (FROM MASTER TO SLAVE)

RDATA = READ DATA (FROM MASTER TO SLAVE)

RREADY = READ DATA READY (FROM SLAVE TO MASTER)

위와 같이 해석합니다. 전혀 어렵지 않죠? READ BRUST 동작은 간단합니다. 버스의 모든 동작은 항상 마스터가 아비터로부터 버스 사용 권한을 획득하여 컨트롤 시그널과 어드레스 시그널을 슬레이브에게 보냄으로써 시작됩니다. 컨트롤 시그널은 버스트 타입, 크기, 길이와 같은 정보들을 포함하고 있으며 어드레스 시그널은 특정 SLAVE의 주소 공간을 의미합니다. 그렇다면 MASTER가 여러 개의 SLAVE가 존재하는 버스에서 R/W할 대상 SLAVE는 어떻게 선택할까요?

그 해답은 아래와 같이 AXI PROTOCOL 특징에 설명되어 있습니다.

In memory mapped AXI (AXI3, AXI4, and AXI4-Lite), all transactions involve the concept
of a target address within a system memory space and data to be transferred.
Memory mapped systems often provide a more homogeneous way to view the system,
because the IPs operate around a defined memory map.

(AXI REFERENCE GUIDE P.11 - MEMORY MAPPED PROTOCOLS)

따라서, AMBA BUS를 사용하는 모든 IP들은 하나의 메모리 레이아웃으로 맵핑되어 있기 때문에 어떤 SLAVE인지는 관심가질 필요가 없으며 단지 SLAVE 영역에 해당하는 어드레스에 접근하기만 하면 됩니다.

첫 번째 T0에서 T1 사이클을 보면, 마스터가 슬레이브로 읽는 동작이 유효할 경우 ARVALID 시그널을 HIGH로 트랜지션합니다. 그 직후 바로 해당 슬레이브의 ADDRESS로 ISSUE(ISSUE는 보낸다는 의미)합니다. 언제까지 ISSUE할까요? 물론 SLAVE가 ARREADY 시그널이 HIGH로 트랜지션할 때까지 말이죠. T2 사이클에 ARREADY의 시그널이 HIGH로 샘플링됨과 동시에 ARVALID와 ARADDR이 LOW로 TRANSTION합니다(첫 번째 빨간 박스). T3에서 T4 사이클을 보면 RREADY 시그널이 HIGH로 되어 있는데 즉, 마스터는 이미 데이터를 받을 준비가 되어 있는 상태임을 알 수 있습니다. T5에서 T6 사이클에서 슬레이브가 마스터에게 보낼 데이터가 유효하다는 RVALID - HIGH 시그널을 보내고 바로 DATA를 전송함으로써 RREADY가 LOW로 트랜지션되는 것을 볼 수 있습니다. 이후 똑같은 과정으로 모든 데이터 전송을 진행하다가 마지막 DATA TRANSFER를 보낼 때 마지막이라는 것을 슬레이브가 마스터에게 알리기 위해 RLAST 시그널을 전송함으로써 하나의 버스트 트랜잭션을 완료합니다(세번 째 빨간 박스). 오늘은 READ BURST까지만 정리하고 다음 시간에는 OVERLAPPING READ BURST, WRITE BURST의 WAVEFORM을 살펴 보도록 하겠습니다.

Written by Simhyeon, Choe

저작자표시 (새창열림)

Posted by FreeChild

2012. 3. 26. 23:21

AMBA AXI PROTOCOL v1.0 - ARCHITECTURE OVERVIEW II

2012. 3. 26. 23:21 in ARM Architecture

The AXI protocol provides a single interface definition for describing interfaces:
• between a master and the interconnect
• between a slave and the interconnect
• between a master and a slave.

The interface definition enables a variety of different interconnect implementations.
The interconnect between devices is equivalent to another device with symmetrical
master and slave ports to which real master and slave devices can be connected.
Most systems use one of three interconnect approaches:
• shared address and data buses
• shared address buses and multiple data buses
• multilayer, with multiple address and data buses.

In most systems, the address channel bandwidth requirement is significantly less than
the data channel bandwidth requirement. Such systems can achieve a good balance
between system performance and interconnect complexity by using a shared address
bus with multiple data buses to enable parallel data transfers.

위의 인터페이스와 인터커넥트 설명을 보면 마스터와 슬레이브라는 용어가 나오는데, 쉽게 얘기해서 마
스터는 AXI 아비터에게 버스 사용의 권한을 획득하여 R/W 작업을 위해 컨트롤 시그널, 어드레스 그리고 데이터를 슬레이브에게 전달하여 슬레이브와 통신이 가능하도록 해주는 디바이스(ex : CPU, DMA 등)를 의미하고 슬레이브는 마스터에게 받은 컨트롤 시그널, 어드레스와 데이터로 슬레이브 영역에서 처리하도록 해주는 디바이스(ex : SRAM, BUFFER, FIFO 등)를 의미합니다. 인터커넥트는 하나 이상의 마스터와 슬레이브가 통신이 가능하도록 메모리 맵핑된 IP로 정의할 수 있습니다. 인터커넥트도 하나의 IP기 때문에 추상적인 개념이 아니라 논리 회로 블럭을 의미합니다.

레지스터 슬라이스(Register slices)
Each AXI channel transfers information in only one direction, and there is no
requirement for a fixed relationship between the various channels. This is important
because it enables the insertion of a register slice in any channel, at the cost of an
additional cycle of latency. This makes possible a trade-off between cycles of latency
and maximum frequency of operation.
It is also possible to use register slices at almost any point within a given interconnect.
It can be advantageous to use a direct, fast connection between a processor and
high-performance memory, but to use simple register slices to isolate a longer path to
less performance-critical peripherals.

레지스터 슬라이스라는 개념을 설명하기 앞서 아래의 그림을 먼저 살펴 보겠습니다.

위의 그림을 보면 왼쪽에는 여러 개의 마스터가 있고 오른쪽에는 여러 개의 슬레이브가 있으며 인터커넥트에 연결되어 있습니다. 녹색 네모가 바로 레지스터 슬라이스로 보시면 됩니다. 레지스터 슬라이스가 필요한 이유가 무엇일까요? 위와 같이 여러 개의 마스터와 슬레이브가 하나의 버스를 공유하는 구조에서는 인터커넥트가 마스터의 작업 처리에 대한 Bottleneck이 발생할 수 밖에 없습니다. 그러니까 마스터 하나가 어드레스 시그널을 슬레이브로 전달한다고 했을 때, 이를 디코딩하기 위한 작업이 슬레이브가 많을 수록 현저히 떨어지는데 결국, 인터커넥트에서 마스터의 어드레싱을 처리하는 응답 속도가 떨어진다는 의미가 되겠지요. 따라서 마스터와 인터커넥트 혹은 슬레이브와 인터커넥트 버스 중간에 레지스터 슬라이스 (비트를 저장할 수 있는 플립플롭으로 보시면 됩니다)를 두어, 인터커넥트까지 거치지 않고 레지스터 슬라이스에서 바로 특정 시그널에 대한 응답을 할 수 있습니다. 이 결과로 Frequency가 높아지겠지만 반대로 Interconnect Path가 많아지게 됨으로써 Latency 지연이 증가하게 되겠지요. 즉, 레지스터 슬라이스를 패스 중간에 얼만큼 설치하느냐에 따라 Frequency와 Latency 간의 TRADE OFF가 있습니다. 하지만, 여기서 Latency가 상쇄될 수 있는 요인이 있는데, 최초에 비어 있는 레지스터 슬라이스 개수만큼 채울 때, Latency가 길어지지만 모두 채운 이후의 흐름에서는 오히려 더 높은 Frequency(짧은 패스)로 캐쉬 작업이 가능하기 때문입니다. 쉽게 얘기해서 레지스터 슬레이스를 설치한 패스 자체를 하나의 파이프 라인 스킴으로 보시면 이해가 쉬울 겁니다. 그리고 마스터와 슬레이브 간의 버스 패스가 너무 길어 특정 사이클 이내로 응답이 불가능한 경우는 레지스터 슬라이스를 설치해야 한다고 하네요.

(위의 그림에서 혼돈을 줄 수 있는 부분이 있습니다만, 여러 개의 마스터와 인터커넥트, 그리고 여러 개의 슬레이브와 인터커넥트가 단선으로 연결되어 오직 하나의 패스에 레지스터 슬라이스가 설치된 것처럼 보이지만, 실제 구조에서는 각 IP마다 따로 인터커넥트가 연결되어 있고 각각의 IP에 레지스터 슬라이스를 설치할 수 있는 구조로 보시면 됩니다.)

오늘은 간단히 여기까지만 정리하고 다음 시간에 타이밍 다이어그램에서 READ와 WRITE BURST의
동작을 살펴보도록 하겠습니다. (아래에 AMBA AXI SPEC 첨부하였으니 참고하세요)

Written by Simhyeon, Choe

AMBAaxi[1].pdf

ug761_axi_reference_guide.pdf

저작자표시 (새창열림)

Posted by FreeChild

2012. 3. 23. 00:02

AMBA AXI PROTOCOL v1.0 - ARCHITECTURE OVERVIEW I

2012. 3. 23. 00:02 in ARM Architecture

AMBA AXI 프로토콜에 대해서 공부해 봅시다. (AMBA AXI Protocol spec v1.0 참조)

1.1 About the AXI protocol

AMBA AXI 프로토콜의 핵심 특징은 다음과 같습니다.

• separate address/control and data phases
• support for unaligned data transfers using byte strobes
• burst-based transactions with only start address issued
• separate read and write data channels to enable low-cost Direct Memory Access(DMA)
• ability to issue multiple outstanding addresses
• out-of-order transaction completion
• easy addition of register stages to provide timing closure.

위의 특징 중에서 AHB와 비교해 진보한 부분을 살펴보면 READ DATA와 WRITE DATA CHANNEL이
나누어져 있는데요. 일반적으로 AHB의 BUS는 같은 시간에 READ/WRITE가 불가능하지만 AXI의
CHANNEL(BUS)은 같은 시간에 READ와 WRITE를 할 수 있습니다. 따라서 THROUGHPUT이 증가하
겠지요. 또 하나의 특징은 MULTIPLE OUTSTANDING ADDRESS로 ISSUING이 가능합니다. AHB의
경우에는 하나의 BURST TRANSACTION이 완료한 후에 다음 BURST TRANSACTION을 위한
ADDRESS를 ISSUING할 수 있지만, AXI에서는 먼저 처리중인 BURST TRANSACTION이 완료될 때
까지 기다리지 않고 TRANSACTION의 ADDRESS를 ISSUING할 수 있습니다. MULTIPLE
OUTSTANDING ADDRESS의 ISSUING이 가능함으로써 OUT-OF-ORDER TRANSACTION의 구현을
할 수 있습니다.

Out-of-order transactions can improve system performance in two ways:

• The interconnect can enable transactions with fast-responding slaves to complete
in advance of earlier transactions with slower slaves.

• Complex slaves can return read data out of order. For example, a data item for a
later access might be available from an internal buffer before the data for an
earlier access is available.

위에서 언급한 것처럼 MASTER는 ADDRESS ISSUING 순서에 상관없이 보다 빠른 SLAVE로부터
트랜잭션을 완료할 수 있고 나중에 읽은 데이터를 먼저 처리할 수도 있습니다.

1.2 Architecture

AXI 프로토콜은 BURST를 기반으로 하고 있고, 모든 트랜잭션은 어드레스 채널에서 전송하는 어드레스
와 컨트롤 정보 가지고 있습니다. 데이터 전송은 마스터와 슬레이브 간의 WRITE DATA CHANNEL을
통해서 슬레이브로, 그리고 READ DATA CHANNEL을 통해서 마스터로 각각 이루어 집니다. WRITE
TRANSACTION에서는 모든 데이터는 마스터에서 슬레이브로 전송이 이루어지고 WRITE
TRANSACTION이 완료되면 슬레이브는 마스터로 WRITE RESPONSE CHANNEL을 통해 완료 시그널
을 전송함으로써 TRANSACTION을 완료하게 됩니다.

각각의 독립적인 채널은 INFORMATION SIGNAL 세트로 이루어져 있고 TWO-WAY VALID와 READY 핸드쉐이크 메커니즘을 사용합니다. 시그널 몇 가지를 간단하게 설명드리면 VALID는 채널상에서 유효
한 DATA와 CONTROL INFORMATION이 사용 가능한지에 대한 여부(가능하다면 엣지가 HIGH 상태가 됨) READY는 데이터를 수락 가능한지에 대한 여부를 나타내는 시그널입니다. 그리고 READ DATA
CHANNEL과 WRITE DATA CHANNEL은 둘 다 LAST 시그널을 가지고 있습니다만, 이것은 트랜잭션
에서 마지막 데이터 아이템의 전송을 의미할 때 HIGH로 ISSUING합니다.

READ ADDRESS CHANNEL AND WRITE ADDRESS CHANNEL

READ와 WRITE TRANSACTION은 각각의 ADDRESS CHANNEL을 사용하는데, 트랜잭션을 위해 ADDRESS와 CONTROL INFORMATION을 ADDRESS CHANNEL을 통해서 전송합니다.

The AXI protocol supports the following mechanisms:
• variable-length bursts, from 1 to 16 data transfers per burst
• bursts with a transfer size of 8-1024 bits
• wrapping, incrementing, and non-incrementing bursts
• atomic operations, using exclusive or locked accesses
• system-level caching and buffering control

1) Read data channel
   The read data channel conveys both the read data and any read response information
   from the slave back to the master. The read data channel includes:
   • the data bus, which can be 8, 16, 32, 64, 128, 256, 512, or 1024 bits wide
   • a read response indicating the completion status of the read transaction.

2) Write data channel
   The write data channel conveys the write data from the master to the slave and includes:
   • the data bus, which can be 8, 16, 32, 64, 128, 256, 512, or 1024 bits wide
   • one byte lane strobe for every eight data bits, indicating which bytes of the data
     bus are valid.
   Write data channel information is always treated as buffered, so that the master can
   perform write transactions without slave acknowledgement of previous write transactions.

3) Write response channel
   The write response channel provides a way for the slave to respond to write transactions.
   All write transactions use completion signaling.
   The completion signal occurs once for each burst, not for each individual data transfer
   within the burst.

Written by Simhyeon, Choe