Runs the 5 kernels (Naive, Coalesced, 2×2 Coarsened, Tiled, Best) for matrix sizes 1024, 2048, 4096 and prints a GFLOPS table.