Book
Multivariable Model – Building: A Pragmatic Approach to Regression Anaylsis based on Fractional Polynomials for Modelling Continuous Variables
Patrick Royston, Willi Sauerbrei
ISBN: 978-0-470-02842-1
322 pages
John Wiley & Sons Ltd, Chichester, England
May 2008
For datasets see the end of the page.
For stata programs see our original website of the book.
Everything on this page was reproduced with permission from John Wiley & Sons Ltd.
From the preface:
“[…] Our general objective is to provide a readable text giving the rationale of, and practical advice on, a unified approach to multivariable modelling which aims to make such models simpler and more effective. […] No multivariable model-building strategy has rigorous theoretical underpinnings. Even those approaches most used in practice have not had their properties studied adequately by simulation. In particular, handling continuous variables in a multivariable context has largely been ignored. Since there is no consensus among researchers on the ‘best’ strategy, a pragmatic approach is required. Our book reflects our views derived from wide experience. The text assumes a basic understanding of multiple regression modelling, but it can be read without detailed mathematical knowledge. […] As expressed in a very readable paper by Chatfield (2002), we aim to ‘encourage and guide practitioners, and also to counterbalance a literature that can be overly concerned with theoretical matters far removed from the day-to-day concerns of many working statisticians’. […]“
Table of Contents
| 1 | Introduction | 1 | ||
| 1.1 | Real-Life Problems as Motivation for Model Building | 1 | ||
| 1.1.1 | Many Candidate Models | 1 | ||
| 1.1.2 | Functional Form for Continuous Predictors | 2 | ||
| 1.1.3 | Example 1: Continuous Response | 2 | ||
| 1.1.4 | Example 2: Multivariable Model for Survival Data | 5 | ||
| 1.2 | Issues in Modelling Continuous Predictors | 8 | ||
| 1.2.1 | Effects of Assumptions | 8 | ||
| 1.2.2 | Global versus Local Influence Models | 9 | ||
| 1.2.3 | Disadvantages of Fractional Polynomial Modelling | 9 | ||
| 1.2.4 | Controlling Model Complexity | 10 | ||
| 1.3 | Types of Regression Model Considered | 10 | ||
| 1.3.1 | Normal-Errors Regression | 10 | ||
| 1.3.2 | Logistic Regression | 12 | ||
| 1.3.3 | Cox Regression | 12 | ||
| 1.3.4 | Generalized Linear Models | 14 | ||
| 1.3.5 | Linear and Additive Predictors | 14 | ||
| 1.4 | Role of Residuals | 15 | ||
| 1.4.1 | Uses of Residuals | 15 | ||
| 1.4.2 | Graphical Analysis of Residuals | 15 | ||
| 1.5 | Role of Subject-Matter Knowledge in Model Development | 16 | ||
| 1.6 | Scope of Model Building in our Book | 17 | ||
| 1.7 | Modelling Preferences | 18 | ||
| 1.7.1 | General Issues | 18 | ||
| 1.7.2 | Criteria for a Good Model | 18 | ||
| 1.7.3 | Personal Preferences | 19 | ||
| 1.8 | General Notation | 20 | ||
| 2 | Variable Selection | 23 | ||
| 2.1 | Introduction | 23 | ||
| 2.2 | Background | 24 | ||
| 2.3 | Preliminaries for a Multivariable Analysis | 25 | ||
| 2.4 | Aims of Multivariable Models | 26 | ||
| 2.5 | Prediction: Summary Statistics and Comparisons | 29 | ||
| 2.6 | Procedures for Selecting Variables | 29 | ||
| 2.6.1 | Strength of Predictors | 30 | ||
| 2.6.2 | Stepwise Procedures | 31 | ||
| 2.6.3 | All-Subsets Model Selection Using Information Criteria | 32 | ||
| 2.6.4 | Further Considerations | 33 | ||
| 2.7 | Comparison of Selection Strategies in Examples | 35 | ||
| 2.7.1 | Myeloma Study | 35 | ||
| 2.7.2 | Educational Body-Fat Data | 36 | ||
| 2.7.3 | Glioma Study | 38 | ||
| 2.8 | Selection and Shrinkage | 40 | ||
| 2.8.1 | Selection Bias | 40 | ||
| 2.8.2 | Simulation Study | 40 | ||
| 2.8.3 | Shrinkage to Correct for Selection Bias | 42 | ||
| 2.8.4 | Post-estimation Shrinkage | 44 | ||
| 2.8.5 | Reducing Selection Bias | 45 | ||
| 2.8.6 | Example | 46 | ||
| 2.9 | Discussion | 47 | ||
| 2.9.1 | Model Building in Small Datasets | 47 | ||
| 2.9.2 | Full, Pre-specified or Selected Model? | 47 | ||
| 2.9.3 | Comparison of Selection Procedures | 49 | ||
| 2.9.4 | Complexity, Stability and Interpretability | 49 | ||
| 2.9.5 | Conclusions and Outlook | 50 | ||
| 3 | Handling Categorical and Continuous Predictors | 53 | ||
| 3.1 | Introduction | 53 | ||
| 3.2 | Types of Predictor | 54 | ||
| 3.2.1 | Binary | 54 | ||
| 3.2.2 | Nominal | 54 | ||
| 3.2.3 | Ordinal, Counting, Continuous | 55 | ||
| 3.2.4 | Derived | 55 | ||
| 3.3 | Handling Ordinal Predictors | 55 | ||
| 3.3.1 | Coding Schemes | 55 | ||
| 3.3.2 | Effect of Coding Schemes on Variable Selection | 56 | ||
| 3.4 | Handling Counting and Continuous Predictors: Categorization | 58 | ||
| 3.4.1 | ‘Optimal’ Cutpoints: A Dangerous Analysis | 58 | ||
| 3.4.2 | Other Ways of Choosing a Cutpoint | 59 | ||
| 3.5 | Example: Issues in Model Building with Categorized Variables | 60 | ||
| 3.5.1 | One Ordinal Variable | 61 | ||
| 3.5.2 | Several Ordinal Variables | 62 | ||
| 3.6 | Handling Counting and Continuous Predictors: Functional Form | 64 | ||
| 3.6.1 | Beyond Linearity | 64 | ||
| 3.6.2 | Does Nonlinearity Matter? | 65 | ||
| 3.6.3 | Simple versus Complex Functions | 66 | ||
| 3.6.4 | Interpretability and Transportability | 66 | ||
| 3.7 | Empirical Curve Fitting | 67 | ||
| 3.7.1 | General Approaches to Smoothing | 68 | ||
| 3.7.2 | Critique of Local and Global Influence Models | 68 | ||
| 3.8 | Discussion | 69 | ||
| 3.8.1 | Sparse Categories | 69 | ||
| 3.8.2 | Choice of Coding Scheme | 69 | ||
| 3.8.3 | Categorizing Continuous Variables | 70 | ||
| 3.8.4 | Handling Continuous Variables | 70 | ||
| 4 | Fractional Polynomials for One Variable | 71 | ||
| 4.1 | Introduction | 72 | ||
| 4.2 | Background | 72 | ||
| 4.2.1 | Genesis | 72 | ||
| 4.2.2 | Types of Model | 73 | ||
| 4.2.3 | Relation to Box–Tidwell and Exponential Functions | 73 | ||
| 4.3 | Definition and Notation | 74 | ||
| 4.3.1 | Fractional Polynomials | 74 | ||
| 4.3.2 | First Derivative | 74 | ||
| 4.4 | Characteristics | 75 | ||
| 4.4.1 | FP1 and FP2 Functions | 75 | ||
| 4.4.2 | Maximum or Minimum of a FP2 Function | 75 | ||
| 4.5 | Examples of Curve Shapes with FP1 and FP2 Functions | 76 | ||
| 4.6 | Choice of Powers | 78 | ||
| 4.7 | Choice of Origin | 79 | ||
| 4.8 | Model Fitting and Estimation | 79 | ||
| 4.9 | Inference | 79 | ||
| 4.9.1 | Hypothesis Testing | 79 | ||
| 4.9.2 | Interval Estimation | 80 | ||
| 4.10 | Function Selection Procedure | 82 | ||
| 4.10.1 | Choice of Default Function | 82 | ||
| 4.10.2 | Closed Test Procedure for Function Selection | 82 | ||
| 4.10.3 | Example | 83 | ||
| 4.10.4 | Sequential Procedure | 83 | ||
| 4.10.5 | Type I Error and Power of the Function Selection Procedure | 84 | ||
| 4.11 | Scaling and Centering | 84 | ||
| 4.11.1 | Computational Aspects | 84 | ||
| 4.11.2 | Examples | 85 | ||
| 4.12 | FP Powers as Approximations to Continuous Powers | 85 | ||
| 4.12.1 | Box–Tidwell and Fractional Polynomial Models | 85 | ||
| 4.12.2 | Example | 85 | ||
| 4.13 | Presentation of Fractional Polynomial Functions | 86 | ||
| 4.13.1 | Graphical | 86 | ||
| 4.13.2 | Tabular | 87 | ||
| 4.14 | Worked Example | 89 | ||
| 4.14.1 | Details of all Fractional Polynomial Models | 89 | ||
| 4.14.2 | Function Selection | 90 | ||
| 4.14.3 | Details of the Fitted Model | 90 | ||
| 4.14.4 | Standard Error of a Fitted Value | 91 | ||
| 4.14.5 | Fitted Odds Ratio and its Confidence Interval | 91 | ||
| 4.15 | Modelling Covariates with a Spike at Zero | 92 | ||
| 4.16 | Power of Fractional Polynomial Analysis | 94 | ||
| 4.16.1 | Underlying Function Linear | 95 | ||
| 4.16.2 | Underlying Function FP1 or FP2 | 95 | ||
| 4.16.3 | Comment | 96 | ||
| 4.17 | Discussion | 97 | ||
| 5 | Some Issues with Univariate Fractional Polynomial Models | 71 | ||
| 5.1 | Introduction | 99 | ||
| 5.2 | Susceptibility to Influential Covariate Observations | 100 | ||
| 5.3 | A Diagnostic Plot for Influential Points in FP Models | 100 | ||
| 5.3.1 | Example 1: Educational Body-Fat Data | 101 | ||
| 5.3.2 | Example 2: Primary Biliary Cirrhosis Data | 101 | ||
| 5.4 | Dependence on Choice of Origin | 103 | ||
| 5.5 | Improving Robustness by Preliminary Transformation | 105 | ||
| 5.5.1 | Example 1: Educational Body-Fat Data | 106 | ||
| 5.5.2 | Example 2: PBC Data | 107 | ||
| 5.5.3 | Practical Use of the Pre-transformation gδ(x) | 107 | ||
| 5.6 | Improving Fit by Preliminary Transformation | 108 | ||
| 5.6.1 | Lack of Fit of Fractional Polynomial Models | 108 | ||
| 5.6.2 | Negative Exponential Pre-transformation | 108 | ||
| 5.7 | Higher Order Fractional Polynomials | 109 | ||
| 5.7.1 | Example 1: Nerve Conduction Data | 109 | ||
| 5.7.2 | Example 2: Triceps Skinfold Thickness | 110 | ||
| 5.8 | When Fractional Polynomial Models are Unsuitable | 111 | ||
| 5.8.1 | Not all Curves are Fractional Polynomials | 111 | ||
| 5.8.2 | Example: Kidney Cancer | 112 | ||
| 5.9 | Discussion | 113 | ||
| 6 | MFP: Multivariable Model-Building with Fractional Polynomials | 115 | ||
| 6.1 | Introduction | 115 | ||
| 6.2 | Motivation | 116 | ||
| 6.3 | The MFP Algorithm | 117 | ||
| 6.3.1 | Remarks | 118 | ||
| 6.3.2 | Example | 118 | ||
| 6.4 | Presenting the Model | 120 | ||
| 6.4.1 | Parameter Estimates | 120 | ||
| 6.4.2 | Function Plots | 121 | ||
| 6.4.3 | Effect Estimates | 121 | ||
| 6.5 | Model Criticism | 123 | ||
| 6.5.1 | Function Plots | 123 | ||
| 6.5.2 | Graphical Analysis of Residuals | 124 | ||
| 6.5.3 | Assessing Fit by Adding More Complex Functions | 125 | ||
| 6.5.4 | Consistency with Subject-Matter Knowledge | 129 | ||
| 6.6 | Further Topics | 129 | ||
| 6.6.1 | Interval Estimation | 129 | ||
| 6.6.2 | Importance of the Nominal Significance Level | 130 | ||
| 6.6.3 | The Full MFP Model | 131 | ||
| 6.6.4 | A Single Predictor of Interest | 132 | ||
| 6.6.5 | Contribution of Individual Variables to the Model Fit | 134 | ||
| 6.6.6 | Predictive Value of Additional Variables | 136 | ||
| 6.7 | Further Examples | 138 | ||
| 6.7.1 | Example 1: Oral Cancer | 138 | ||
| 6.7.2 | Example 2: Diabetes | 139 | ||
| 6.7.3 | Example 3: Whitehall I | 140 | ||
| 6.8 | Simple Versus Complex Fractional Polynomial Models | 144 | ||
| 6.8.1 | Complexity and Modelling Aims | 144 | ||
| 6.8.2 | Example: GBSG Breast Cancer Data | 144 | ||
| 6.9 | Discussion | 146 | ||
| 6.9.1 | Philosophy of MFP | 147 | ||
| 6.9.2 | Function Complexity, Sample Size and Subject-Matter Knowledge | 148 | ||
| 6.9.3 | Improving Robustness by Preliminary Covariate Transformation | 148 | ||
| 6.9.4 | Conclusion and Future | 149 | ||
| 7 | Interactions | 151 | ||
| 7.1 | Introduction | 151 | ||
| 7.2 | Background | 152 | ||
| 7.3 | General Considerations | 152 | ||
| 7.3.1 | Effect of Type of Predictor | 152 | ||
| 7.3.2 | Power | 153 | ||
| 7.3.3 | Randomized Trials and Observational Studies | 153 | ||
| 7.3.4 | Predefined Hypothesis or Hypothesis Generation | 153 | ||
| 7.3.5 | Interactions Caused by Mismodelling Main Effects | 154 | ||
| 7.3.6 | The ‘Treatment–Effect’ Plot | 154 | ||
| 7.3.7 | Graphical Checks, Sensitivity and Stability Analyses | 154 | ||
| 7.3.8 | Cautious Interpretation is Essential | 155 | ||
| 7.4 | The MFPI Procedure | 155 | ||
| 7.4.1 | Model Simplification | 156 | ||
| 7.4.2 | Check of the Results and Sensitivity Analysis | 156 | ||
| 7.5 | Example 1: Advanced Prostate Cancer | 157 | ||
| 7.5.1 | The Fitted Model | 158 | ||
| 7.5.2 | Check of the Interactions | 160 | ||
| 7.5.3 | Final Model | 161 | ||
| 7.5.4 | Further Comments and Interpretation | 162 | ||
| 7.5.5 | FP Model Simplification | 163 | ||
| 7.6 | Example 2: GBSG Breast Cancer Study | 163 | ||
| 7.6.1 | Oestrogen Receptor Positivity as a Predictive Factor | 163 | ||
| 7.6.2 | A Predefined Hypothesis: Tamoxifen–Oestrogen Receptor Interaction | 163 | ||
| 7.7 | Categorization | 165 | ||
| 7.7.1 | Interaction with Categorized Variables | 165 | ||
| 7.7.2 | Example: GBSG Study | 166 | ||
| 7.8 | STEPP | 167 | ||
| 7.9 | Example 3: Comparison of STEPP with MFPI | 168 | ||
| 7.9.1 | Interaction in the Kidney Cancer Data | 168 | ||
| 7.9.2 | Stability Investigation | 168 | ||
| 7.10 | Comment on Type I Error of MFPI | 171 | ||
| 7.11 | Continuous-by-Continuous Interactions | 172 | ||
| 7.11.1 | Mismodelling May Induce Interaction | 173 | ||
| 7.11.2 | MFPIgen: An FP Procedure to Investigate Interactions | 174 | ||
| 7.11.3 | Examples of MFPIgen | 175 | ||
| 7.11.4 | Graphical Presentation of Continuous-by-Continuous Interactions | 179 | ||
| 7.11.5 | Summary | 180 | ||
| 7.12 | Multi-Category Variables | 181 | ||
| 7.13 | Discussion | 181 | ||
| 8 | Model Stability | 183 | ||
| 8.1 | Introduction | 183 | ||
| 8.2 | Background | 184 | ||
| 8.3 | Using the Bootstrap to Explore Model Stability | 185 | ||
| 8.3.1 | Selection of Variables within a Bootstrap Sample | 185 | ||
| 8.3.2 | The Bootstrap Inclusion Frequency and the Importance of a Variable | 186 | ||
| 8.4 | Example 1: Glioma Data | 186 | ||
| 8.5 | Example 2: Educational Body-Fat Data | 188 | ||
| 8.5.1 | Effect of Influential Observations on Model Selection | 189 | ||
| 8.6 | Example 3: Breast Cancer Diagnosis | 190 | ||
| 8.7 | Model Stability for Functions | 191 | ||
| 8.7.1 | Summarizing Variation between Curves | 191 | ||
| 8.7.2 | Measures of Curve Instability | 192 | ||
| 8.8 | Example 4: GBSG Breast Cancer Data | 193 | ||
| 8.8.1 | Interdependencies among Selected Variables and Functions in Subsets | 193 | ||
| 8.8.2 | Plots of Functions | 193 | ||
| 8.8.3 | Instability Measures | 195 | ||
| 8.8.4 | Stability of Functions Depending on Other Variables Included | 196 | ||
| 8.9 | Discussion | 197 | ||
| 8.9.1 | Relationship between Inclusion Fractions | 198 | ||
| 8.9.2 | Stability of Functions | 198 | ||
| 9 | Some Comparisons of MFP with Splines | 201 | ||
| 9.1 | Introduction | 201 | ||
| 9.2 | Background | 202 | ||
| 9.3 | MVRS: A Procedure for Model Building with Regression Splines | 203 | ||
| 9.3.1 | Restricted Cubic Spline Functions | 203 | ||
| 9.3.2 | Function Selection Procedure for Restricted Cubic Splines | 205 | ||
| 9.3.3 | The MVRS Algorithm | 205 | ||
| 9.4 | MVSS: A Procedure for Model Building with Cubic Smoothing Splines | 205 | ||
| 9.4.1 | Cubic Smoothing Splines | 205 | ||
| 9.4.2 | Function Selection Procedure for Cubic Smoothing Splines | 206 | ||
| 9.4.3 | The MVSS Algorithm | 206 | ||
| 9.5 | Example 1: Boston Housing Data | 207 | ||
| 9.5.1 | Effect of Reducing the Sample Size | 208 | ||
| 9.5.2 | Comparing Predictors | 212 | ||
| 9.6 | Example 2: GBSG Breast Cancer Study | 214 | ||
| 9.7 | Example 3: Pima Indians | 215 | ||
| 9.8 | Example 4: PBC | 217 | ||
| 9.9 | Discussion | 219 | ||
| 9.9.1 | Splines in General | 220 | ||
| 9.9.2 | Complexity of Functions | 221 | ||
| 9.9.3 | Optimal Fit or Transferability? | 221 | ||
| 9.9.4 | Reporting of Selected Models | 221 | ||
| 9.9.5 | Conclusion | 222 | ||
| 10 | How ToWork with MFP | 223 | ||
| 10.1 | Introduction | 223 | ||
| 10.2 | The Dataset | 223 | ||
| 10.3 | Univariate Analyses | 226 | ||
| 10.4 | MFP Analysis | 227 | ||
| 10.5 | Model Criticism | 228 | ||
| 10.5.1 | Function Plots | 228 | ||
| 10.5.2 | Residuals and Lack of Fit | 228 | ||
| 10.5.3 | Robustness Transformation and Subject-Matter Knowledge | 229 | ||
| 10.5.4 | Diagnostic Plot for Influential Observations | 230 | ||
| 10.5.5 | Refined Model | 231 | ||
| 10.5.6 | Interactions | 231 | ||
| 10.6 | Stability Analysis | 232 | ||
| 10.7 | Final Model | 235 | ||
| 10.8 | Issues to be Aware of | 235 | ||
| 10.8.1 | Selecting the Main-Effects Model | 235 | ||
| 10.8.2 | Further Comments on Stability | 236 | ||
| 10.8.3 | Searching for Interactions | 238 | ||
| 10.9 | Discussion | 238 | ||
| 11 | Special Topics Involving Fractional Polynomials | 241 | ||
| 11.1 | Time-Varying Hazard Ratios in the Cox Model | 241 | ||
| 11.1.1 | The Fractional Polynomial Time Procedure | 242 | ||
| 11.1.2 | The MFP Time Procedure | 243 | ||
| 11.1.3 | Prognostic Model with Time-Varying Effects for Patients with Breast Cancer | 243 | ||
| 11.1.4 | Categorization of Survival Time | 245 | ||
| 11.1.5 | Discussion | 246 | ||
| 11.2 | Age-specific Reference Intervals | 247 | ||
| 11.2.1 | Example: Fetal growth | 247 | ||
| 11.2.2 | Using FP Functions as Smoothers | 248 | ||
| 11.2.3 | More Sophisticated Distributional Assumptions | 249 | ||
| 11.2.4 | Discussion | 249 | ||
| 11.3 | Other Topics | 250 | ||
| 11.3.1 | Quantitative Risk Assessment in Developmental Toxicity Studies | 250 | ||
| 11.3.2 | Model Uncertainty for Functions | 251 | ||
| 11.3.3 | Relative Survival | 252 | ||
| 11.3.4 | Approximating Smooth Functions | 253 | ||
| 11.3.5 | Miscellaneous Applications | 254 | ||
| 12 | Epilogue | 255 | ||
| 12.1 | Introduction | 255 | ||
| 12.2 | Towards Recommendations for Practice | 255 | ||
| 12.2.1 | Variable Selection Procedure | 255 | ||
| 12.2.2 | Functional Form for Continuous Covariates | 257 | ||
| 12.2.3 | Extreme Values or Influential Points | 257 | ||
| 12.2.4 | Sensitivity Analysis | 257 | ||
| 12.2.5 | Check for Model Stability | 258 | ||
| 12.2.6 | Complexity of a Predictor | 258 | ||
| 12.2.7 | Check for Interactions | 258 | ||
| 12.3 | Omitted Topics and Future Directions | 258 | ||
| 12.3.1 | Measurement Error in Covariates | 258 | ||
| 12.3.2 | Meta-analysis | 258 | ||
| 12.3.3 | Multi-level (Hierarchical) Models | 259 | ||
| 12.3.4 | Missing Covariate Data | 259 | ||
| 12.3.5 | Other Types of Model | 259 | ||
| 12.4 | Conclusion | 259 | ||
| Appendix A: Data and Software Resources | 261 | |||
| A.1 | Summaries of Datasets | 261 | ||
| A.2 | Datasets used more than once | 262 | ||
| A.3 | Software | 267 | ||
| Appendix B: Glossary of Abbreviations | 269 | |||
| References | 271 | |||
| Index | 285 | |||
Datasets and some information are available for download here
Datasets in available formats – Stata – SAS – Excel – ASCII
For more details about the data see the Appendix A of the book.
| Name (and Link) | Outcome | Obs. | Events | Variablesa | Section reference |
| Myeloma | Survival | 65 | 48 | 16 | 2.7.1 |
| Freiburg DNA breast cancer | Survival | 109 | 56 | 1 | 3.4.1 |
| Cervix cancer | Binary | 899 | 141 | 21 | 3.5 |
| Nerve conduction | Cont. | 406 | N/A | 1 | 5.7.1 |
| Triceps skinfold thickness | Cont. | 892 | N/A | 1 | 5.7.2 |
| Diabetes | Cont. | 42 | N/A | 2 | 6.7.2 |
| Advanced prostate cancer | Survival | 475 | 338 | 13 | 7.5 |
| Quit smoking study | Cont. | 250 | N/A | 3 | 7.11.3 |
| Breast cancer diagnosis | Binary | 458 | 133 | 6 | 8.6 |
| Boston housing | Cont. | 506 | N/A | 13 | 9.5 |
| Pima Indians | Binary | 768 | 268 | 8 | 9.7 |
| Rotterdam breast cancer | Survival | 2982 | 1518 | 11 | 11.1.3 |
| Fetal growth | Cont. | 574 | N/A | 1 | 11.2.1 |
| Cholesterol | Cont. | 553 | N/A | 1 | 11.2.3 |
Table A.1 Datasets used once in our book. N/A = not applicable. Further details accompany the example in the relevant section (page 261).
a Maximum number of predictors used in analyses. Categorical variables count as
>1 predictor, if modelled using several dummy variables.
| Name | Outcome | Obs. | Events | Variablesa | Section reference |
| Research body fat | Cont. | 326 | N/A | 1 | 1.1.3, 4.2.1, 4.9.1, 4.9.2, 4.10.3, 4.12 |
| GBSG breast cancer | Survival | 686 | 299 | 9 | 1.1.4,3.6.2, 5.6.2,6.5.2, 6.5.3, 6.5.4,6.6.5, 6.6.6, 6.8.2, 7.6, 7.7.2, 8.8, 9.6 |
| Educational body fat | Cont. | 252 | N/A | 13 | 2.7.2, 2.8.6, 5.2, 5.3.1, 5.5.1, 8.5 |
| Glioma | Survial | 411 | 274 | 15 | 2.7.3, 8.4 |
| Prostate cancer | Cont. | 97 | N/A | 7 | 3.6.2, 3.6.3, 4.15, 6.2, 6.3.2, 6.4.2, 6.4.3, 6.5.1, 6.5.3, 6.6.1, 6.6.2, 6.6.3, 6.6.4, 7.11.3 |
| Whitehall I | Survival | 17 260 | 2576 | 10 | 6.7.3 |
| Binary | 17 260 | 1670 | 10 | 4.13.1, 4.13.2, 4.14, 7.11.1,7.11.3 | |
| PBC | Survival | 418 | 161 | 17 | 5.3.2, 5.4, 5.5.2, 9.8 |
| Oral cancer | Binary | 397 | 194 | 1 | 6.7.1, 9.3.1 |
| Kidney cancer | Survival | 347 | 322 | 10 | 5.8.2,7.9 |
Table A.2 Datasets used more than once in our book. N/A = not applicable. Further details are given in Appendix A.2 (page 262).
a Maximum number of predictors used in analyses. Categorical variables count as
>1 predictor, if modelled using several dummy variables.