As MFP uses backward elimination (BE) as the method for variable selection this website will concentrate on this procedure. However, problems discussed in the context of BE are not specific to it, but most of them apply to all variable selection procedures. See chapter 2 of our book (Selection of Variables) for a detailed discussion of some issues related to this topic. Here we assume that all continuous variables can be modelled with linear functions.
This is the summary taken from our book (p.23)
"Summary

Because subjectmatter knowledge in observational studies is usually limited, datadriven model selection has an important role.

Multivariable modelling has different possible goals; the main distinction is between predictive and explanatory models. The former aims for a good predictor, whereas the latter aims to identify important variables.

Despite claimed theoretical advantages, the full model is not a practical proposition in most studies.

Interpretability and practical usefulness are essential attributes of explanatory models. Simple models are more likely to have such properties.

Several model selection algorithms have been suggested. The most used are stepwise methods, the best of which is BE. Methods based on information criteria (AIC and the Bayesian information criterion (BIC)) are the main competitors.

With stepwise methods, the key tuning parameter is the nominal Pvalue for selecting or eliminating a variable. Larger Pvalues produce larger models.

Replication instability of selected models should be assessed by bootstrap.

Parameter estimates of selected models, irrespective of the strategy used, are prone to different types of bias. With small samples the bias can be large.

Techniques combining selection of variables with shrinkage of their parameter estimates reduce bias, but their properties require further exploration."
Reproduced from R&S (2008) with permission from John Wiley & Sons Ltd. 
This summary was written in 2007, but we were aware of many aspects when we discussed the development of a pragmatic multivariable procedure for simultaneous selection of variables and selection of functional forms for continuous variables in the mid90s.We concluded that backward elimination (allowing reinclusion of variables eliminated in an earlier step) is the most suitable procedure if an explanatory model is the main aim of an analysis and the sample size is not 'small'.
For further details see our discussion points (Model Building in Small Datasets; Full, Prespecified or Selected Model; Comparison of Selection Procedures; Complexity, Stability and Interpretability; Conclusions and Outlook) in chapter 2. A good explanatory model is also an acceptable model for prediction, but others (including the 'full' prespecified model) may have (slightly) better predictive ability.
There have been several recent developments in the literature on variable selection but we know of no strong argument favouring replacement of backward elimination with another procedure in the MFP algorithm. Please have in mind that we do not consider 'small' sample sizes and that we also exclude highdimensional data, such as geneexpression data, from our considerations.
Relevant issues
1. Aim of the model and model complexity
Many different aims are possible when devoloping a multivariable model (2.4 Aims of Multivariable Models). The most important distinction seems to be whether "To explain or to predict", the title of a recent paper by Shmueli (2010). The significance level (to be chosen by the analyst) is the key tuning parameter that influences model complexity. For the discussion of close relationships between the significance level and the information criteria AIC and BIC see 2.6 Procedures for Selecting Variables.
For further discussion see:

Sauerbrei (1999): The use of resampling methods to simplify regression models in medical statistics

Sauerbrei et al (2015): On stability issues in deriving multivariable regression models
2. Model complexity, model stability and model uncertainty
Model complexity, model stability and model uncertainty are three different issue of datadependent model building. However, they are closely related. A more complex model (in this context a model including more variables) is usually less stable as it includes several variables which have only a ‘weak’ effect on the outcome. When selecting a specific model, the uncertainty of the selection process is (usually) ignored. To improve models for prediction the model uncertainty concept has been introduced in the mid90s of the last century. A predictor and its variance is estimated by averaging predictors from many (instable) models.
Resampling approaches such as the bootstrap are popular to investigate for model stability.
Usually the Bayesian framework is used for model selection and assessment of model uncertainty (Bayesian model averaging). Extending our work on model stability, we have suggested using the bootstrap to handle model uncertainty.
For more details see:

Sauerbrei (1999): The use of resampling methods to simplify regression models in medical statistics

Sauerbrei et al (2008): Investigation about a screening step in model selection

Sauerbrei et al (2015): On stability issues in deriving multivariable regression models
We have also conducted some investigations in the context of function stability (Royston & Sauerbrei (2003):Stability of multivariable fractional polynomial models with selection of variables and transformations: a bootstrap investigation.).
Similar investigations can also be found for a model derived with splines (Binder & Sauerbrei (2009): Stability analysis of an additive spline model for respiratory health data by using knot removal).
3. Selection and shrinkage
Several methods have been proposed that combine variable selection and shrinkage, aiming to derive a multivariable model and estimate its regression parameters.
We discuss a postestimation shrinkage approach (2.8 Selection and Shrinkage). It is based on leaveoneout cross validation (Van Houwelingen & Le Cessie (1990): Predictive value of statistical models) and has been extended to use parameterwise shrinkage factors (Sauerbrei (1999): The use of resampling methods to simplify regression models in medical statistics).
For a more detailed investigation, see Van Houwelingen & Sauerbrei (2013): Crossvalidation, shrinkage and variable selection in linear regression revisited.
To derive estimates for variables which are either highly correlated (e.g. the two terms of an FP2 function) or associated with regard to the contents, the approach was extended to enable joint shrinkage factors. (Dunkler et al (2016): Global, Parameterwise and Joint PostEstimation Shrinkage).
A section on postestimation shrinkage can also be found in the book article Schumacher et al (2012): Prognostic Factor Studies.