Goparaju, S; Tamo, I; Calderbank, R, *An Improved Sub-Packetization Bound for Minimum Storage Regenerating
Codes*
(May, 2013) [1305.3498v1] .
**Abstract:**

*Distributed storage systems employ codes to provide resilience to failure of
multiple storage disks. Specifically, an $(n, k)$ MDS code stores $k$ symbols
in $n$ disks such that the overall system is tolerant to a failure of up to
$n-k$ disks. However, access to at least $k$ disks is still required to repair
a single erasure. To reduce repair bandwidth, array codes are used where the
stored symbols or packets are vectors of length $\ell$. MDS array codes have
the potential to repair a single erasure using a fraction $1/(n-k)$ of data
stored in the remaining disks. We introduce new methods of analysis which
capitalize on the translation of the storage system problem into a geometric
problem on a set of operators and subspaces. In particular, we ask the
following question: for a given $(n, k)$, what is the minimum vector-length or
sub-packetization factor $\ell$ required to achieve this optimal fraction? For
\emph{exact recovery} of systematic disks in an MDS code of low redundancy,
i.e. $k/n > 1/2$, the best known explicit codes \cite{WTB12} have a
sub-packetization factor $\ell$ which is exponential in $k$. It has been
conjectured \cite{TWB12} that for a fixed number of parity nodes, it is in fact
necessary for $\ell$ to be exponential in $k$. In this paper, we provide a new
log-squared converse bound on $k$ for a given $\ell$, and prove that $k \le
2\log_2\ell\left(\log_{\delta}\ell+1\right)$, for an arbitrary number of parity
nodes $r = n-k$, where $\delta = r/(r-1)$.*