10xEngineers

Making ARA Vector Processor RISC-V Vector Extension (RVV) 1.0 Compatible

Author: Nouman Akbar, 10xEngineers, Pakistan

Introduction

The RISC-V Vector Extension (RVV) has undergone significant revisions since its initial release in 2015, with version 1.0 ratified and frozen after 6 years in November 2021. Its ratification is an important milestone as it provides an open-source and standardized extension for hardware and software development. It served as the openly available alternative to the proprietary ISA’s and thus enabled open-source implementations of vector processors. One of those is the ARA, which is an open-source, scalable, 64-bit vector processor hosted by the pulp platform. It is currently configured as the co-processor of CVA6 as shown in figure 1. ARA runs at more than 1 GHz in the typical corner. Initially, it was based on the version 0.5 draft of  RISC-V Vector Extension (RVV) and gradually updated to RVV version 0.10. It required updates to ensure compatibility with the latest RVV 1.0 standard. So 10xEngineers took on the project to add support for RVV version 1.0 to ARA.

This case study presents our experience in upgrading the ARA Vector processor to RVV 1.0 compatibility, focusing on the implementation of missing RVV permute, mask, fixed-point, and some of the RVV floating point instructions.

Figure 1: Architecture of ARA as a coprocessor with CVA6

Objectives

The primary objectives of this project were:

  1. Update the ARA Vector processor to ensure compatibility with RVV 1.0.
  2. Implement missing permute, mask, floating point, and fixed-point instructions.
  3. Verify the functional correctness of the updated processor.

Solution:

To achieve the objectives, we followed a structured approach:

  1. RVV 1.0 Specification Study: Analyzed the RVV 1.0 specification to identify changes and additions.
  2. Gap Analysis: Identified missing instructions in the ARA Vector processor, including permute, mask, and fixed-point operations.
  3. Microarch Documents: Understood the ARA microarchitecture and added the microarchitecture documentation.
  4. Instruction Implementation: Designed and implemented the missing instructions:
    • Permute instructions
    • Mask instructions
    • Fixed-point instructions
    • Vector Floating point instructions
  5. Added support for shorter VLEN=128,256,512 in ARA
  6. Verification:
    • Developed tests to verify the functionality and correctness of the implemented instructions.
    • Ran regressions. Debugged and fixed RTL bugs.

Implementation Details

The following instructions were designed and added to the ARA.

RVV Mask Instructions

Mask instructions operate on a mask register. It means that they process single bits from the mask Vector register. Following mask instructions were implemented:
  • vmsbf.m
  • vmsif
  • vmsof.m
  • viota.m
  • vid
  • vcpop
  • vfirst
Logic was added to the ARA’s dispatch unit, alu in the ARA lanes, and the mask unit to add these instructions For reference, refer to PR#149 and PR#178 to check the design implementation of these instructions.

RVV Permute Instructions

Permute instructions in the RVV are used to move around the vector register’s elements. The following permute instructions were implemented:
  • vrgather
  • vrgatherei16
  • vcompress
These instructions were the most challenging ones as they required operands from across the lanes as the ARA’s VRF is divided across the lanes, so they need to get and write back data from/to all the lanes. To handle them, the main logic was added in the mask unit as it has access to all the lanes’ VRF data. Check PR#180 for reference.

RVV Fixed-Point Instructions

Fixed point instructions are implemented to perform fixed point arithmetics. The following fixed-point instructions were implemented:
  • vsmul
  • vssra 
  • vssrl
  • vnclup 
  • vnclipu
Logic for fixed point rounding was added. The Dispatcher unit and alu in the lanes were updated to support these instructions. For vsmul, we utilized the already present multiplier unit. Please see PR#147 for reference.

RVV Floating point

RVV floating point instructions operate on floating point numbers. The following instructions were identified as missing in the ARA’s implementation and were added to make it compliant with RVV 1.0.
  • vfrec7.v
  • vfsqrt.v
  • vfncvt.rod.f.f.w
Major RTL additions for these instructions were implemented in the ARA dispatcher and Vector Floating point units. For vfncvt.rod.f.f.w, round-towards-odd was not implemented in the floating-point unit. So, added the support for it in the RTL. Please refer to PR#191, PR#184 & PR#201 to check the design implementation.

Support for various VLENs

Ara supported the VLENs greater than or equal to 1024 bits. So, the support in the RTL was added to have VLENs equal to 128, 256, and 512 bits. To enable this, RTL was modified to have a single-lane configuration of ARA. Refer to PR#194 to see the detailed design implementation.

Conclusion

The successful upgrade of the ARA Vector processor to RVV 1.0 compatibility demonstrates the importance of staying updated with evolving standards. The implementation of missing instructions and adding support for various VLENs ensures the processor’s relevance in various applications, including scientific simulations, machine learning, and multimedia processing. This project contributes to the growing ecosystem of RISC-V-based Vector processors.