We present a general approach for the parallelization of the interpolation with radial basis functions (RBF) on distributed memory systems, which might use various shared memory hardware as accelerator for the local subtasks involved. The calculation of an interpolant in general requires a global dense system to be solved. Iterative methods need appropriate preconditioning to achieve reasonable iteration counts. For the shared memory approach we use a special Krylov subspace method, namely the FGP algorithm. Addressing the distributed task we start with a simple block-Jacobi iteration with each block solved in parallel. Adding a coarse representation leads to a two-level block-Jacobi iteration with much better iteration counts and a wider applicability.