Bioequivalence calculation rules

Detection of “crossover” versus “parallel” design

In PKanalix, the design is detected based on the data set columns. If no OCCASION column is present, the design is detected as “parallel”. On the opposite, if one or several columns have been tagged as OCCASION, the design is detected as “crossover” (repeated or non-repeated). The detected design is displayed in the bioequivalence settings for the users information and cannot be changed. Depending on the detected design, the factors selected by default for the linear model are different.

Linear model

The linear model can only include fixed effects, as recommended by the FDA for parallel and non-repeated crossover design (see Statistical Approaches to Establishing Bioequivalence, page 10) and by the EMA for parallel, non-repeated crossover and repeated crossover designs (see Guideline On The Investigation Of Bioequivalence, page 15). In addition, “id” is automatically considered as nested in “sequence” and no additional nesting can be defined. Interaction terms and random effects are not supported.

According to the regulatory guidelines, the default models are:

  • parallel design:
    \(\log(\textrm{NCAparam}) \sim \textrm{FORM}\)
    which can also be written as
    \(\log(\textrm{NCAparam})=\textrm{intercept} + \beta_{test} \textrm{[ if FORM=test]} + \varepsilon\)
    in case of two formulations “ref” and “test”, with \(\varepsilon\) a normal random variable (i.e the residuals).
  • crossover design:
    \(\log(\textrm{NCAparam}) \sim \textrm{SEQ + ID + PERIOD + FORM}\)
    which can also be written as
    \(\log(\textrm{NCAparam})=\textrm{intercept} + \beta_{test} \textrm{[ if FORM=test]} + \beta_{period2} \textrm{[ if PERIOD=2]} + \beta_{seqTR} \textrm{[ if SEQ=TR]} + \beta_{id2} \textrm{[ if ID=id2]} + \beta_{id3} \textrm{[ if ID=id3]} + … + \varepsilon\)
    in case of several individuals (id1, id2, etc), two sequences (“RT” and “TR”), two periods (“1” and “2”) and two formulations (“ref” and “test”).

The model parameters are calculated using a QR factorization.

The \(\beta\) parameters are called the coefficients. They are saved in the output file estimatedCoefficients_XXX.txt. We are interested in the coefficient representing the formulation effect \(\beta_{test}\), which is also called the point estimate.

Difference and ratio

In the Results > Confidence intervals table, the “difference” corresponds to the point estimate \(\beta_{test}\) (see above). The “ratio” is calculated depending on the log-transformation choice:

  • without log-transformation: \(\textrm{Ratio}=\textrm{LSM}_{test}/\textrm{LSM}_{ref}\times 100\)  with \(LSM\) the least square mean (also called adjusted mean, see below).
  • with log-transformation: \(\textrm{Ratio}=\exp (\beta_{test}) \times 100\)

Confidence intervals

The confidence interval is first calculated for the difference (\(CI_{diff}\)) and then for the ratio (\(CI_{ratio}\)).

\(CI_{diff}=\beta_{test} \pm t(1-\alpha,df) \times SE \)

with \(\beta_{test}\) the point estimate (“difference”, see above), \(t(1-\alpha,df)\) the quantiles of a Student t-distribution at the \(\alpha\) level and \(df\) degrees of freedom, and \(SE\) the standard error of the point estimate.

In case of a parallel design with the Welch-Satterthwaite correction (see Bioequivalence settings), the formula is different:

\(CI_{diff}=\beta_{test} \pm t(1-\alpha,df) \times \sqrt{\frac{S_R^2}{n_R}+\frac{S_T^2}{n_T}} \)

with \(S_X\) the samples standard deviation of the individuals and \(n_X\) the number of individuals having received formulation X. A correction is also applied to the degrees of freedom, which are calculated as:

\(df=\frac{\left(\frac{S_T^2}{n_T}+\frac{S_R^2}{n_R}\right)^2}{\frac{(S_T^2/n_T)^2}{n_T – 1}+\frac{(S_R^2/n_R)^2}{n_R – 1}}\)

The confidence interval for the ratio is then calculated using the following formula:

  • without log-transformation: \(CI_{ratio}=(1+CI_{diff}/\textrm{LSM}_{ref})\times100\) with \(\textrm{LSM}_{ref}\) the least square mean for the reference formulation (see below).
  • with log-transformation: \(CI_{ratio}=\exp(CI_{diff})\times 100\)

Adjusted means (least square means)

The model NCAparam ~ SEQ + ID + PERIOD + FORM allows to calculate the expected value (mathematical expectation) for any value of SEQ, ID, PERIOD and FORM. The least square mean for the reference represents the expected value for the reference formulation leaving the value for the other factors undefined. To calculate it, we average the model-predicted values across the levels of ID, PERIOD and SEQ.

The weights assigned to each combination of levels depend if the factors are nested or not. A factor A is said nested in factor B if each level of A appears in only one level of B. By design, in crossover bioequivalence studies, ID is nested in SEQUENCE (i.e each individual belongs to only one sequence). Thus, in PKanalix, when ID and SEQUENCE are defined as factors in the linear model, we assume that ID is nested in SEQUENCE when calculating the least square means. No further nesting is assumed, nor can be specified. Note that the nesting definition only affects the adjusted means and not the point estimate (ratio) and its confidence interval.

Excluded individuals

In case of a parallel design, individuals for which the NCA parameter cannot be calculated are excluded.

In case of a crossover design, incomplete individuals are excluded. Incomplete individual are those that do not have a value for each period, either because these were no concentration data for this period (period missing in the data set) or because the computed NCA parameter could not be calculated for this period (e.g because the data were insufficient to calculate the terminal slope). In case of a non-repeated crossover design, excluding the individuals with missing data or not has no impact on the calculation of the point estimate and confidence interval. In case of a repeated crossover design (i.e when the individuals receive several times the same formulation), excluding the individuals for which the NCA parameter for one or more periods is missing has an impact on the point estimate calculation and confidence interval. The common practice is to exclude incomplete individuals (see also the EMA Guideline On The Investigation Of Bioequivalence, page 14).

Excluded individuals are calculated for each NCA parameter separately, as some individuals may have values for all periods for the Cmax but not the AUCINF_obs for instance.

In the Results > Confidence intervals, the number of individuals “N” does not count the excluded individuals. Thus, in case of a crossover design, the number of individuals contributing to ref and to test is the same. In the BE plots, only the individuals included in the bioequivalence analysis are shown.


The sum of squares presented in the ANOVA table of the Results tab are type-I sequential sum of squares. In case of an unbalanced design (i.e not the same number of individuals receiving RT versus TR), the type-I sum of squares depends on the order of the included factors. In PKanalix, the enforced order is SEQUENCE + ID + PERIOD + FORMULATION (+ ADDITIONAL), following the SAS sample codes provided by the FDA (see Statistical Approaches to Establishing Bioequivalence, 2001, Appendix E) and EMA (see Questions & Answers: positions on specific questions addressed to the Pharmacokinetics Working Party, 2015, question 8).

Coefficient of variation

In Results > Coefficients of variation, the SD corresponds to the standard deviation of the residuals, i.e to the standard deviation of the normal random variable \(\varepsilon\) in the linear model \(\log(\textrm{NCAparam})=\textrm{intercept} + \beta_{test} \textrm{[ if FORM=test]} + \varepsilon\) (for the typical model of a parallel design for instance).

The coefficient of variation is then calculated as:

  • without log-transformation: \(CV = SD / LSM_{ref}\)  (with \(LSM_{ref}\) the least square mean for the reference formulation, see above)
  • with log-transformation: \(CV = 100 \times \sqrt{\exp{(SD^2)}-1}\)