Achieving various forms of fault-tolerance in large-scale
distributed
systems--a prime example being the Internet--has become an
important
problem. Using a relaxed definition of faults, which includes
temporary congestion of servers and other transient adverse
conditions,
we seek to design distributed control algorithms which are
resilient
when subject to certain patterns of worst-case faults called
k-sparse
errors. Sparsity has a recursive definition which roughly says
that
bad events are well-separated in space-time, and space-time locii
where
this condition is violated can be covered by small balls which are
then recursively well-separated in space-time at the next scale and
so on. Of intense theoretical interest is the question whether
simple
distributed algorithms can be designed which remain fault-tolerant
in
the presence of