The identification and estimation of heterogeneous treatment effects in biomedical clinical trials are challenging, because trials are typically planned to assess the treatment effect in the overall trial population. Nevertheless, the identification of how the treatment effect may vary across subgroups is of major importance for drug development. In this work, we review some existing simulation work and perform a simulation study to evaluate recent methods for identifying and estimating the heterogeneous treatments effects using various metrics and scenarios relevant for drug development. Our focus is not only on a comparison of the methods in general, but on how well these methods perform in simulation scenarios that reflect real clinical trials. We provide the R package benchtm that can be used to simulate synthetic biomarker distributions based on real clinical trial data and to create interpretable scenarios to benchmark methods for identification and estimation of treatment effect heterogeneity.