Add RVV optimization for ZSTD_row_getMatchMask
This pull request introduces a RISC-V Vector (RVV) specific optimization for the ZSTD_row_getMatchMask function, replacing the generic SWAR implementation on RV64 platforms with V-extension support. The goal is to leverage RVV's parallel computation capabilities to improve performance on the RISC-V architecture.
Performance
Microbenchmark Results
A microbenchmark isolating the ZSTD_row_getMatchMask function shows a significant speedup compared to the SWAR fallback.
rowEntries |
Speedup |
|---|---|
| 16 bytes | 5.87x |
| 32 bytes | 9.63x |
| 64 bytes | 17.98x |
Fullbench
The overall impact on the fullbench is modest. However, the new implementation shows a consistent small improvement and, most importantly, no performance regression.
Validation
-
All quick checks passed (
make check). -
All long-running tests passed (
make test). -
Static analysis reports no new issues (
make staticAnalyze).