An old serial program for Lattice Field Theory Simulations, initially written in fixed-form Fortran during decades of research work, needed to be optimised for modern architectures. In this talk I will describe the challenges, the decisions, and the steps we took in order to improve its performance and its scientific output (which is a multi-level optimisation problem), discuss the tools used and the experience we had with them (Intel VTune, ITAC, the Scalasca suite and the BSC Performance Tools), and report our - sometimes surprising - findings.