Full length article|Articles in Press

# Enhanced diagnosis of advanced fibrosis and cirrhosis in individuals with NAFLD using FibroScan-based Agile scores

Open AccessPublished:November 11, 2022

## Highlights

• Noninvasive tests’ ability to rule-in advanced fibrosis and cirrhosis is moderate
• Consequently, two new FibroScan-based scores are proposed: Agile 3+ and Agile 4
• They demonstrate fewer indeterminate results and higher positive predictive value
• Clinical performances are globally validated in two large independent cohorts
• Use of these scores in clinical practice could reduce the need for liver biopsy

## Abstract

### Background & Aims

Currently available non-invasive tests, including FIB-4 and liver stiffness measurement (LSM by VCTE), are highly effective in excluding advanced fibrosis (AF) or cirrhosis yet their ability to rule in is moderate. Our objective was to develop and validate two new scores (Agile 4 and Agile 3+), combining LSM with routine clinical parameters to identify cirrhosis or AF, respectively, in those with NAFLD in specialized liver clinics, with optimized positive predictive value (PPV) and reduced number of cases with indeterminate results.

### Methods

This international study included 7 adult cohorts with suspected NAFLD who underwent liver biopsy, LSM and blood sampling during routine clinical practice or screening for trials. The population was randomly divided into training set (TS) and an internal validation set (VS), on which the best-fitting logistic regression model was built and performance and goodness of fit of the model were assessed, respectively. Furthermore, both scores were externally validated on 2 large cohorts. Cut-offs for high sensitivity and specificity were derived in the TS to rule-out and rule-in cirrhosis or AF and then tested in the VS and compared to FIB-4 and LSM.

### Results

Each score combined LSM, AST/ALT ratio, platelets, sex and diabetes status, as well as age for Agile 3+. Calibration plots for Agile 4 and Agile 3+ indicated satisfactory to excellent goodness of fit. Agile 4 and Agile 3+ outperformed FIB-4 and LSM in terms of AUROC, percentage of patients with indeterminate results and PPV to rule-in cirrhosis or AF.

### Conclusions

The two novel non-invasive scores improve identification of cirrhosis or AF among liver clinics for NAFLD patients and reduce the need for liver biopsy in this population.

### Impact and implications

Non-invasive tests currently used to identify patients with advanced fibrosis or cirrhosis, such as FIB-4 and Liver Stiffness Measurement by Vibration-Controlled Transient Elastography (LSM by VCTE) have high negative predictive values but high false positive results and often a large number of cases with indeterminate results.
This study provides scores that will help the clinician diagnose advanced fibrosis or cirrhosis.
Those new easy-to-implement scores will be tools to (1) help liver specialists better identify patients who need more intensive follow up, (2) to be referred for inclusion in treatment trials for those with NAFLD and either advanced fibrosis or cirrhosis, and (3) guide who should be treated with pharmacological agents when effective therapies are approved.

## Abbreviations

AAR
AST/ALT ratio
AF
AHT
Arterial Hypertension
ALB
Albumin
ALT
Alanine aminotransferase
AST
Aspartate aminotransferase
AUROC
Area under the ROC curve
BMI
Body mass index
CI
Confidence interval
FIB-4
Fibrosis-4 index
GGT
γ-glutamyltransferase
HDL
High-density lipoproteins
LB
Liver biopsy
LDL
Low-density lipoproteins
LSM
Liver Stiffness Measurement
NAFLD
Non-Alcoholic Fatty Liver Disease
NASH-CRN
Nonalcoholic Steatohepatitis Clinical Research Network
NPV
Negative Predictive Value
PPV
Positive Predictive Value
ROC
Se
Sensitivity
Sp
Specificity
TRIG
Triglycerides
TS
Training set
VCTE
Vibration-Controlled Transient Elastography
VS
Validation set

## Financial support STATEMENT

The sponsor of the study (Echosens SA, Paris, France) had a role in the study design, data collection, data analysis, data interpretation and writing of the report. The corresponding author and the sponsor had full access to all data in the study and had full responsibility for the decision to submit the publication.

## Data availability statement

Data used for this work are unavailable to access because they are confidential.

## AUTHORS CONTRIBUTIONs

AJS, JB, JF, AL, MD, CF-P, LS and VM were involved in the study concept and design, data analysis and data interpretation. MR was involved in data analyses. JF, CF-P, AJS and JB wrote the manuscript. Data collection was done by AJS, ZMY, SAH, PNN, WKC, YY, VDL, CC, MHZ, VWSW, ME, RSH, RPM, JB and CF-P. All authors reviewed and commented on the manuscript and approved the final version.

## Introduction

Non-alcoholic fatty liver disease (NAFLD) is a leading cause of liver-related mortality and is already the leading etiology of liver disease requiring liver transplantation in women
• Younossi Z.M.
• Stepanova M.
• Ong J.
• Trimble G.
• AlQahtani S.
• Younossi I.
• et al.
Nonalcoholic Steatohepatitis Is the Most Rapidly Increasing Indication for Liver Transplantation in the United States.
. The burden of end-stage liver disease is expected to increase over the coming decade given the high prevalence of NAFLD
• Swain M.G.
• Ramji A.
• Patel K.
• Sebastiani G.
• Shaheen A.A.
• Tam E.
• et al.
Burden of nonalcoholic fatty liver disease in Canada, 2019-2030: a modelling study.
. In patients with NAFLD, the fibrosis stage is a critical determinant of prognosis and mortality with a substantial step up in all-cause mortality and liver related outcomes in those with bridging fibrosis (stage 3 disease) or cirrhosis (stage 4)
• Sanyal A.J.
• Van Natta M.L.
• Clark J.
• Neuschwander-Tetri B.A.
• Diehl A.
• Dasarathy S.
• et al.
Prospective Study of Outcomes in Adults with Nonalcoholic Fatty Liver Disease.
. These sub-populations are thus at highest risk of outcomes underscoring the need to identify these individuals within the population with NAFLD.
For patients referred to secondary and tertiary care advanced liver care clinics to be seen for assessment of patients with NAFLD, a key diagnostic objective is to identify those with stage 3 or 4 disease. The current reference standard for this is histological assessment of liver biopsy sections. Liver biopsies are invasive and can occasionally cause severe morbidity and even mortality
• Rockey D.C.
• Caldwell S.H.
• Goodman Z.D.
• Nelson R.C.
• Smith A.D.
Liver biopsy.
. It is further limited by sampling, intra- and inter-observer variability in interpretation
• Davison B.A.
• Harrison S.A.
• Cotter G.
• Alkhouri N.
• Sanyal A.
• Edwards C.
• et al.
Suboptimal reliability of liver biopsy evaluation has implications for randomized clinical trials.
. These limitations have restricted the widespread use of a liver-biopsy based approach in clinical care and served as a rationale to develop non-invasive tools (NITs) for this purpose. While a substantial body of literature on the use of laboratory aids such as the FIB-4 score or vibration-controlled transient elastography (VCTE) has been published, none have met regulatory standards for approval and there remains a continued need to develop NITs to identify those with NAFLD who have AF (stages 3 or 4) or cirrhosis.
In this we developed and validated two scores (Agile 3+ and Agile 4) to diagnose AF or cirrhosis, respectively, in populations being evaluated for NAFLD. These scores combine liver stiffness measurements (LSM), as measured by vibration controlled transient elastography (VCTE), with additional laboratory and demographic features. The context of use was to use these tests as diagnostic tools for enrichment of the probability of having AF or cirrhosis in those being evaluated for NAFLD in secondary and tertiary care hepatology practices. This is expected to inform and assist clinical decision making with respect to initiation of currently recommended standard of care surveillance for hepatocellular cancer and esophageal varices, referral for treatment trials targeting such individuals, and, eventually, for consideration for specific pharmacological treatments when these are established and approved.
The specific goal of this study was to establish the utility of the Agile 3+ and Agile 4 scores for the diagnosis of AF or cirrhosis in those being evaluated for NAFLD in hepatology clinical practices. A secondary goal was to determine if these scores outperformed commonly used approaches such as FIB4 and LSM measured by vibration controlled transient elastography (VCTE) for this purpose. These goals were met by studies with the following objectives: (1) to develop and calibrate the Agile 3+ and 4 scores and establish their sensitivity and specificity for diagnosis of AF or cirrhosis respectively, (2) to optimize cut-offs to maximize the specificity without clinically relevant loss of sensitivity to maximize the positive predictive value (PPV) while reducing the proportion of individuals with indeterminant results, (3) to externally validate these findings in independent populations derived from hepatology clinics i.e. the intended use setting, (4) to investigate the impact of BMI, steatosis, diabetes, VCTE probe type and prevalence of the target conditions on the new score performances.

## Material and methods

### Description of data

Data from nine cohorts of adult patients who underwent liver biopsy (LB) for evaluation of NAFLD with concomitant blood work-up for routine biological markers and LSM by VCTE (FibroScan, Echosens, France) were gathered. Data came from North America, Eastern & Western Europe and Asia. The transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) guidelines
• Moons K.G.M.
• Altman D.G.
• Reitsma J.B.
• Ioannidis J.P.A.
• Steyerberg E.W.
• et al.
Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): Explanation and elaboration.
were followed to report the development and internal and external validation of the prediction model for diagnosis of cirrhosis and AF (Supplementary Table 1).
Seven cohorts came from secondary/tertiary Hepatology clinics, one cohort came from the baseline visit (including screen failure patients) from a clinical trial and one cohort came from the NAFLD Adult Database 2 of the Non-alcoholic Steatohepatitis Clinical Research Network (NASH CRN, NIDDK) (also all tertiary care Hepatology clinics). All cohort data were collected in the framework of a clinical study for which the local ethical committee granted approval and may have already been used completely or in parts for other publications (Supplementary Tables 2 and 3). Patients gave written informed consent to participate in the studies. Each study was conducted in accordance with the Declaration of Helsinki and in agreement with the International Conference on Harmonization guidelines on Good Clinical Practice. FibroScan operators were masked to patients’ clinical and histological data. All LB results were read by expert pathologists blinded by patients’ clinical data and FibroScan devices’ results.
Among these nine cohorts, seven were pooled together to constitute the internal dataset that was then randomly split into a training set (TS) and an internal validation set (VS) (2:1) by stratifying on cohort and fibrosis stage. The two other datasets (named “NASH CRN” cohort and “French NAFLD” cohort) were used as external VS. For the French NAFLD cohort, statistical analyses were independently conducted by the investigator (JB) and his team in agreement with all concerned parties.

### Eligibility

Eligible patients were aged 18 years or older and had a LB and a FibroScan examination performed within 6 months. Additionally, a single blood collection with all the required biological parameters was available within 6 months of the LB and 1 month of the FibroScan examination.
Patients who met the following criteria were excluded:
• non-metabolic comorbidities that could have induced liver disease such as viral hepatitis, drug-induced liver injury, excessive alcohol consumption, or HIV;
• less than eight valid measurements for LSM by VCTE
• Afdhal N.H.
• Bacon B.R.
• Patel K.
• Lawitz E.J.
• Gordon S.C.
• Nelson D.R.
• et al.
Accuracy of fibroscan, compared with histology, in analysis of liver fibrosis in patients with hepatitis B or C: A united states multicenter study.
;
• missing data for the variables needed in the developed scores and for the fibrosis stage.
Furthermore, in case of patients assessed with both M and XL probes (8.5% of patients from the TS and 8.1% of patients from the internal VS), the FibroScan examination corresponding to the XL probe was only considered when the patients’ BMI was greater or equal to 35 kg/m2. In the French NAFLD cohort, the BMI cut-off was 30 kg/m2
• Wong V.W.S.
• Irles M.
• Wong G.L.H.
• Shili S.
• Chan A.W.H.
• Merrouche W.
• et al.
Unified interpretation of liver stiffness measurement by M and XL probes in non-alcoholic fatty liver disease.
. Patients measured with both M and XL probes with missing BMI value were excluded.

### Variables

The main outcomes were the diagnoses of AF (F≥3) or cirrhosis (F=4) using the NASH CRN scoring system
• Kleiner D.E.
• Brunt E.M.
• Van Natta M.
• Behling C.
• Contos M.J.
• Cummings O.W.
• et al.
Design and validation of a histological scoring system for nonalcoholic fatty liver disease.
. The models considered 16 predictor variables: LSM by VCTE (kPa), age (years), sex, diabetes status (types 1 and 2 regardless of treatment), hypertension (AHT, regardless of treatment), body mass index (BMI, kg/m2), aspartate aminotransferase (AST, U/L), alanine aminotransferase (ALT, U/L), AST/ALT ratio (AAR), platelets (PLT, G/L), high-density lipoproteins (HDL, mmol/L), low-density lipoproteins (LDL, mmol/L), albumin (ALB, g/L), gamma glutamyl transferase (GGT, U/L), triglycerides (TRIG, mmol/L), fasting glucose (mmol/L). Those 16 predictors were a priori considered to develop the models because they are among the most common and simple routine parameters assessed during the initial evaluation of NAFLD patients. Moreover, because of the collinearity between AST and ALT, we performed separate model developments with AST, ALT or AAR. Of these, the model with AAR gave the best discriminative power and was therefore selected.

### Statistical analysis

#### Sample size

The sample size was determined for the development of a clinical prediction model
• Riley R.D.
• Ensor J.
• Snell K.I.E.
• Harrell F.E.
• Martin G.P.
• Reitsma J.B.
• et al.
Calculating the sample size required for developing a clinical prediction model.
. To develop a new logistic regression model based on up to 16 candidate predictor parameters and an anticipated Cox-Snell R squared statistic ($Rcs2$) of at least 0.1, and to target an expected shrinkage factor of 0.9, a sample size of at least 1358 patients was needed.

#### Scores’ construction

Each of the two scores was developed independently on the TS. The selection of parameters was based on the combination of LSM with clinical parameters and laboratory biomarkers related to liver fibrosis. Each model was developed in three steps:
• i.
Parameters were combined into a multivariable logistic regression model with a backward stepwise selection procedure to select the optimal parameters

Steyerberg EW. Clinical Prediction Models. Cham: Springer International Publishing; 2019. https://doi.org/10.1007/978-3-030-16399-0.

(Supplementary Tables 8 and 9).
• ii.
As the obtained models included too many parameters to be easily implemented, simplified models were derived by withdrawing one or several variables (all combinations were tested) from the model obtained at step 1 (full model). The possibility to remove parameters was evaluated using a likelihood ratio test selection procedure on nested models. Simplified models with smaller number of parameters were selected if non significantly different (p≥0.01) from the full model using the likelihood ratio test (with multiple testing correction)
• Buse A.
The Likelihood Ratio, Wald, and Lagrange Multiplier Tests: An Expository Note.
(Supplementary Tables 10 and 11).
• iii.
Finally, variable transformations were performed using multivariable fractional polynomials
• Sauerbrei W.
• Royston P.
Building Multivariable Prognostic and Diagnostic Models: Transformation of the Predictors by Using Fractional Polynomials.
to optimize the models.

#### Overall diagnostic performances

Performances of both the scores were assessed by the goodness of fit, discrimination, and decision curves and compared to LSM alone and FIB-4 used as predictors of the considered target. The goodness of fit (the agreement between observed outcome and prediction) was evaluated using calibration plots

Steyerberg EW. Clinical Prediction Models. Cham: Springer International Publishing; 2019. https://doi.org/10.1007/978-3-030-16399-0.

and discrimination using the area under the receiver operating curve (AUROC). AUROC comparisons were performed using the Delong test (at a two-sided 5% significance level)
• DeLong E.R.
• DeLong D.M.
• Clarke-Pearson D.L.
Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach.
using LB fibrosis stage as the reference. To take into account the impact of false positive and false negatives rates, decision curve analysis
• Vickers A.J.
• Elkin E.B.
Decision curve analysis: a novel method for evaluating prediction models.
• Majumdar A.
• Campos S.
• Gurusamy K.
• Pinzani M.
• Tsochatzis E.A.
Defining the Minimum Acceptable Diagnostic Accuracy of Noninvasive Fibrosis Testing in Cirrhosis: A Decision Analytic Modeling Study.
• Vickers A.J.
• Van Calster B.
• Steyerberg E.W.
Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests.
was also performed (method detailed in Supplementary Methods).

#### Dual cut-off approach

Optimal rule-out (high sensitivity) and rule-in (high specificity) sets of cut-offs were selected to decrease the number of patients with indeterminate results (in-between the two cut-off values) compared to LSM and FIB-4 and to increase the PPV in the rule-in zone without substantially degrading sensitivity. To do so we tested cut-off values with sensitivity (Se) and specificity (Sp) at 85, 90 and 95% and all their combinations and chose and reported the optimal combinations in the TS. Exact same sets of cut-offs were then applied to the VS. Performances when using the usual 90% sensitivity and 90% specificity cut-offs were also reported. Then, for the diagnosis of F4, rule-in cut-off value with 99% specificity was derived as well in the TS for FIB-4, LSM and Agile 4 to obtain a very high PPV. When evaluating performance at a given cut-off, sensitivity, specificity, PPV, negative predictive value (NPV) were computed. At last, for the diagnosis of AF, previously published cut-off values for FIB-4 and LSM
• McPherson S.
• Hardy T.
• Dufour J.F.
• Petta S.
• Romero-Gomez M.
• Allison M.
• et al.
Age as a Confounding Factor for the Accurate Non-Invasive Diagnosis of Advanced NAFLD Fibrosis.
,
• Papatheodoridi M.
• Hiriart J.B.
• Lupsor-Platon M.
• Bronte F.
• Boursier J.
• Elshaarawy O.
• et al.
Refining the Baveno VI elastography criteria for the definition of compensated advanced chronic liver disease.
were also used for comparison to Agile 3+.

#### Sensitivity analyses

AUROCs of both the scores for patients with BMI≥30kg/m2 vs BMI<30kg/m2, with steatosis severity S0/S1 vs S≥2, with vs without diabetes and with LSM measured with M vs XL probe were compared to evaluate the impact of obesity, steatosis, diabetes, and probe on Agile 4 and Agile 3+.
Since the predictive values depend on the target prevalence, a sensitivity analysis was carried out in order to assess the impact of prevalence on the predictive values at given sensitivity and specificity and therefore at a fixed cut-off. Prevalence of AF varied from 0.05 to 0.55 and that of cirrhosis from 0.02 to 0.25.
Statistical analyses were performed using the R software version 3.6 and subsequent

R: The R Project for Statistical Computing n.d. https://www.r-project.org/(accessed September 30, 2020).

Packages pROC
• Turck N.
• Vutskits L.
• Sanchez-Pena P.
• Robin X.
• Hainard A.
• Gex-Fabry M.
• et al.
pROC: an open-source package for R and S+ to analyze and compare ROC curves.
, glmnet
• Friedman J.
• Hastie T.
• Tibshirani R.
Regularization paths for generalized linear models via coordinate descent.
and mfp

Gareth Ambler AB. mfp: Multivariable Fractional Polynomials 2021.

were used to develop and study the performances of the models.

## Results

### Patient characteristics

The internal dataset consisted of 2134 patients (flowchart in Supplementary Fig. 1), of which 1434 were in the TS to construct the scores and 700 in the internal VS. As expected, the TS and the internal VS had similar characteristics in terms of collected parameters and distribution of fibrosis stages (Table 1). In both datasets, the prevalence of AF and cirrhosis was 54% and 23% respectively, which was higher than those expected in NAFLD patients seen in secondary/tertiary care liver clinics
• Siddiqui M.S.
• Vuppalanchi R.
• Van Natta M.L.
• Hallinan E.
• Kowdley K.V.
• Abdelmalek M.
• et al.
Vibration-Controlled Transient Elastography to Assess Fibrosis and Steatosis in Patients With Nonalcoholic Fatty Liver Disease.
• Eddowes P.J.
• Sasso M.
• Allison M.
• Tsochatzis E.
• Anstee Q.M.
• Sheridan D.
• et al.
Accuracy of FibroScan Controlled Attenuation Parameter and Liver Stiffness Measurement in Assessing Steatosis and Fibrosis in Patients With Nonalcoholic Fatty Liver Disease.
• Wong V.W.-S.
• Vergniol J.
• Wong G.L.-H.
• Foucher J.
• Chan H.L.-Y.
• Le Bail B.
• et al.
Diagnosis of fibrosis and cirrhosis using liver stiffness measurement in nonalcoholic fatty liver disease.
. For external validation, the NASH CRN cohort comprised of 585 patients, of which 13% had cirrhosis and 37% had AF. The French NAFLD cohort comprised of 1042 patients and was very similar to the NASH CRN cohort: 13% had cirrhosis and 38% had AF. Both NASH CRN and French NAFLD cohorts correspond to the intended use population, so for the TS and the internal VS, PPV and NPV were adjusted using a prevalence of 13% for cirrhosis and 37% for AF. As reported in Table 1, the TS and the internal VS had broadly similar demographic, metabolic, serological characteristics to the external VS. However, while there were as many men as women in the TS (50.8% of men) and in the internal VS (51.3% of men), there were fewer men in the NASH CRN cohort (37.4%) and more men in the French NAFLD cohort (59.7%). Moreover, patients in the French NAFLD cohort had higher ALT values with a median value of 57 U/L in contrast to the VS that had median values ranging from 47 U/L to 49 U/L. Furthermore, as expected, due to the high prevalence of cirrhosis and AF in the TS and in the internal VS, higher values of LSM (∼10 kPa in TS and internal VS) were observed compared to LSM in the NASH CRN and the French NAFLD cohorts (∼8 kPa). Patient characteristics of each cohort by target are detailed in Supplementary Tables 4–7.
Table 1Training set, internal validation set, NASH CRN cohort and French NAFLD cohort patient characteristics.
Training set

N=1434
Internal VS

N=700
NASH CRN cohort

N=585
French NAFLD cohort
Analysis performed by Pr Boursier and his team.

N=1042
Median (IQR) or n (%)NMedian (IQR) or n (%)NMedian (IQR) or n (%)NMedian (IQR) or n (%)N
Demographics
Age (years)55.0 (16.0)143455.5 (16.0)70054.0 (17.0)58558.0 (15.4)1042
Male sex729 (50.8%)1434359 (51.3%)700219 (37.4%)585622 (59.7%)1042
BMI (kg/m2)31.7 (7.80)132531.6 (8.05)64634.6 (9.10)58431.2 (7.7)1037
Metabolic
Diabetes (type 1 and 2)723 (50.4%)1434357 (51.0%)700268 (45.8%)585508 (48.8%)1042
Hypertension719 (50.1%)1434344 (49.1%)700334 (57.1%)585....
22% of missing data.
Blood
AST (U/L)39.0 (31.0)143438.0 (29.0)70037.0 (28.0)58539.5 (26.0)1042
ALT (U/L)49.0 (47.0)143447.0 (45.2)70048.0 (42.0)58557.0 (45.0)1042
AAR0.808 (0.381)14340.820 (0.397)7000.793 (0.332)5850.72 (0.37)1042
Platelets count (G/L)219 (94.0)1434222 (95.2)700228 (92.0)585218 (85.0)1042
HDL (mmol/L)1.14 (0.414)11141.11 (0.390)5411.11 (0.388)5811.14 (0.390)997
LDL (mmol/L)2.59 (1.32)10882.56 (1.24)5302.61 (1.33)568....
51% of missing data.
Albumin (g/L)45.0 (5.00)133844.0 (5.00)65444.0 (4.00)58343.0 (5.0)1033
GGT (UI/L)58.0 (70.0)133761.0 (71.8)65443.0 (53.0)58177.5 (106.3)1042
Triglycerides (mmol/L)1.69 (1.04)11191.64 (1.03)5451.62 (1.11)5811.53 (1.03)1002
Fasting glucose (mmol/L)6.11 (2.36)13156.10 (2.39)6455.88 (1.94)5825.80 (2.30)1011
Non-invasive tests
FIB-41.40 (1.25)14341.40 (1.20)7001.30 (1.06)5851.40 (1.17)1042
LSM by VCTE (kPa)10.8 (10.2)143410.5 (9.83)7008.60 (7.20)5858.50 (6.70)1042
Fibrosis stage
NASH CRN scoring system14347005851042
F0202 (14.1%)97 (13.9%)121 (20.7%)116 (11.1%)
F1269 (18.8%)130 (18.6%)134 (22.9%)240 (23.0%)
F2191 (13.3%)93 (13.3%)116 (19.8%)286 (27.5%)
F3437 (30.5%)215 (30.7%)139 (23.8%)267 (25.6%)
F4335 (23.4%)165 (23.6%)75 (12.8%)133 (12.8%)
Results are median (IQR) and number of available data for numeric parameters and n(%) for categorical parameters.
AAR=AST/ALT ratio, ALT=Alanine aminotransferase, AST=Aspartate aminotransferase, BMI=Body mass index, FIB-4=Fibrosis-4 index, GGT=γ-glutamyltransferase, HDL=High-density lipoproteins, LDL=Low-density lipoproteins, LSM=Liver stiffness measurement, NAFLD=Nonalcoholic fatty liver disease, NASH CRN=Nonalcoholic Steatohepatitis Clinical Research Network, VCTE=Vibration-controlled transient elastography, VS=Validation set.
a Analysis performed by Pr Boursier and his team.
b 22% of missing data.
c 51% of missing data.

### Agile 4

#### Score construction

The parameters significantly contributing to the prediction of cirrhosis were LSM, AAR, PLT, sex and diabetes status (details on the predictors selected at each stage of the score construction presented in Supplementary Tables 8 and 10). Considering diabetes status: yes = 1, no = 0 and sex: male = 1, female = 0, this resulted in the following equation:
$Agile4=elogit(pF=4)1+elogit(pF=4)$

with $logit(pF=4)=7.50139−15.42498×1LSM−0.01378×PLT−1.41149×AAR−1−0.53281×Sex+0.41741×Diabetesstatus$
As Agile 4 is the predicted probability of cirrhosis from the logistic regression model, it is bounded between 0 and 1 and can be interpreted in a probabilistic manner.

#### Overall diagnostic performances

On the TS and the internal VS, the calibration line was close to the ideal calibration that conveyed an excellent goodness of fit of predicted probability of cirrhosis (Supplementary Fig. 2). Furthermore, predictive performances in terms of discrimination of Agile 4 indicated an AUROC of 0.91 (95% CI=[0.89; 0.92]) in the TS and 0.89 (95% CI=[0.87;0.92]) in the internal VS, significantly different from the AUROC of LSM (Delong test p < 0.0001) and FIB-4 (p < 0.0001) (Table 2 & Supplementary Fig. 3). Decision curves (Supplementary Fig. 4) also suggest that Agile 4 is a better option compared to FIB-4, LSM alone or even treating all patients as having cirrhosis since it has the highest net benefit and the highest clinical value across the range of threshold probabilities [0.0; 0.5].
Table 2Diagnostic performances of FIB-4, LSM and Agile 3+ for the diagnosis of advanced fibrosis and of FIB-4, LSM and Agile 4 the diagnosis of cirrhosis in the training and internal validation sets.
Training setInternal VS
FIB-4LSMAgileFIB-4LSMAgile
F4 targetAUROC [95% CI]0.83 [0.80;0.85]0.86 [0.84;0.89]0.91 [0.89;0.92]0.82 [0.78;0.85]0.85 [0.81;0.88]0.89 [0.87;0.92]
Delong test p (vs Agile 4)< 0.0001< 0.0001NA< 0.0001< 0.0001NA
Rule out cut-off (≥85% Se)<1.39<12.1<0.251<1.39<12.1<0.251
% patients50%58%67%49%59%68%
Se [95% CI]/Sp[95% CI]0.85 [0.81;0.89]/0.60 [0.57;0.63]0.86 [0.82;0.90]

/0.71 [0.68;0.74]
0.85 [0.81;0.89]/0.82 [0.80;0.84]0.87 [0.82;0.92]

/0.61 [0.57;0.65]
0.79 [0.73;0.85]/0.71

[0.67;0.75]
0.79 [0.72;0.85]/0.83 [0.80;0.86]
NPV0.96
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
0.97
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
0.97
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
0.97
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
0.96
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
0.96
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
Indeterminate zone [85%Se; 95%Sp[
% patients39%29%17%40%28%16%
Rule in cut-off (≥95% Sp)≥3.25≥23.2≥0.565≥3.25≥23.2≥0.565
% patients11%14%17%11%12%16%
Se [95% CI]/Sp[95% CI]0.33 [0.28;0.38]/0.95 [0.93;0.96]0.43 [0.38;0.48]/0.95 [0.94;0.96]0.55 [0.50;0.60]/0.95 [0.94;0.96]0.30 [0.23;0.37]/0.95 [0.93;0.97]0.36 [0.29;0.43]/0.95 [0.93;0.97]0.53 [0.45;0.61]/0.96 [0.94;0.98]
PPV0.50
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
0.56
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
0.63
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
0.48
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
0.52
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
0.64
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
F≥3 targetAUROC [95% CI]0.82 [0.80;0.84]0.86 [0.84;0.88]0.90 [0.88;0.91]0.84 [0.81;0.86]0.85 [0.82;0.88]0.90 [0.88;0.92]
Delong test p (vs Agile 3+)< 0.0001< 0.0001NA< 0.0001< 0.0001NA
Rule out cut-off (≥85% Se)<1.12<9.2<0.451<1.12<9.2<0.451
% patients37%40%44%36%41%42%
Se [95% CI]/Sp[95% CI]0.85 [0.82;0.88]/0.62 [0.58;0.66]0.85 [0.82;0.88]/0.69 [0.65;0.73]0.85 [0.82;0.88]/0.78 [0.75;0.81]0.84 [0.80;0.88]/0.61 [0.56;0.66]0.83 [0.79;0.87]/0.69 [0.64;0.74]0.87 [0.84;0.90]/0.76 [0.71;0.81]
NPV0.87
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
0.89
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
0.90
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
0.87
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
0.88
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
0.91
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
Indeterminate zone [85%Se; 90%Sp[
% patients30%23%13%28%24%17%
Rule in cut-off (≥90% Sp)≥1.81≥13.6≥0.679≥1.81≥13.6≥0.679
% patients33%37%43%36%36%42%
Se [95% CI]/Sp[95% CI]0.53 [0.49;0.57]/0.90 [0.88;0.92]0.61 [0.58;0.64]/0.90 [0.88;0.92]0.71 [0.68;0.74]/0.90 [0.88;0.92]0.57 [0.52;0.62]/0.90 [0.87;0.93]0.57 [0.52;0.62]/0.90 [0.87;0.93]0.69 [0.64;0.74]/0.91 [0.88;0.94]
PPV0.76
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
0.78
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
0.81
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
0.77
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
0.77
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
0.81
Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
AUROC=Area under the receiver operating characteristic curve, CI=Confidence interval, FIB-4=Fibrosis-4 index, Agile=Agile 3+ and Agile 4, LSM=Liver stiffness measurement, NAFLD=Nonalcoholic fatty liver disease, NPV=Negative predictive value, PPV=Positive predictive value, Se=Sensitivity, Sp=Specificity, VS=Validation set.
a Due to the high prevalence of cirrhosis and AF in the TS and the internal VS, PPV and NPV for these datasets were adjusted on the prevalence of external validation, i.e. F4 = 13% and F≥3 = 37%.
Calibrations plots were satisfactory for NASH CRN and also for French NAFLD cohorts (Supplementary Figs. 5 and 6). Though those calibration plots are slightly away from the ideal calibration, most of them fall within the 95% confidence intervals (CI). Excellent discrimination (Table 3) of Agile 4 was observed in both the NASH CRN (AUROC = 0.93, 95% CI=[0.91;0.96]) and the French NAFLD cohorts (AUROC = 0.89, 95% CI=[0.86;0.92]). Moreover, significant differences of the AUROC of Agile 4 were seen compared to that of LSM (p < 0.0001) in the NASH CRN cohort and that of FIB-4 (p = 0.0028 and p < 0.0001) in both external VS. Decision curves in external validation sets (Fig. 1 (A) & (B)) show that, whatever the cohorts and across the range of threshold probabilities [0.0; 0.5], Agile 4 is a better option compared to FIB-4 or even treating all patients as having cirrhosis since it has the highest net benefit. For the NASH CRN cohort, Agile 4 has a highest net benefit than LSM across the range of threshold probabilities between 0.20 and around 0.45. For the French NAFLD cohort, Agile 4 and LSM have similar net benefits.
Table 3Diagnostic performances of FIB-4, LSM and Agile 3+ for the diagnosis of advanced fibrosis and of FIB-4, LSM and Agile 4 the diagnosis of cirrhosis in the external validation sets.
NASH CRN cohortFrench NAFLD cohort
Analysis performed by Pr Boursier and his team.
FIB-4LSMAgileFIB-4LSMAgile
F4 targetAUROC [95% CI]0.83 [0.79;0.88]0.89 [0.86;0.93]0.93 [0.91;0.96]0.81 [0.77;0.85]0.88 [0.85;0.91]0.89 [0.86;0.92]
Delong test p (vs Agile 4)< 0.00010.0028NA< 0.00010.2363NA
Rule out cut-off (≥85% Se)<1.39<12.1<0.251<1.39<12.1<0.251
% patients57%70%77%49%72%81%
Se [95% CI]/Sp[95% CI]0.85 [0.77;0.93]/0.63 [0.59;0.67]0.88 [0.81;0.95]/0.79 [0.75;0.83]0.87 [0.79;0.95]/0.86 [0.83;0.89]0.85 [0.82;0.881]/0.54 [0.50;0.58]0.76 [0.73;0.79]/0.79 [0.76;0.82]0.71 [0.68;0.74]/0.88 [0.86;0.90]
NPV0.970.980.980.960.960.96
Indeterminate zone [85%Se; 95%Sp[
% patients36%22%13%43%19%11%
Rule in cut-off (≥95% Sp)≥3.25≥23.2≥0.565≥3.25≥23.2≥0.565
% patients8%8%10%8%9%8%
Se [95% CI]/Sp[95% CI]0.35 [0.24;0.46]/0.96 [0.94;0.98]0.41 [0.30;0.52]/0.97 [0.96;0.98]0.55 [0.44;0.66]/0.97 [0.96;0.98]0.35 [0.25;0.45]/0.96 [0.92;1.00]0.44 [0.34;0.54]/0.96 [0.92;1.00]0.44 [0.33;0.55]/0.97 [0.93;1.00]
PPV0.580.690.720.550.620.68
F≥3 targetAUROC [95% CI]0.78 [0.74;0.82]0.83 [0.80;0.87]0.86 [0.84;0.89]0.78 [0.76;0.81]0.84 [0.81;0.86]0.87 [0.85;0.89]
Delong test p (vs Agile 3+)< 0.00010.0042NA<0.00010.0011NA
Rule out cut-off (≥85% Se)<1.12<9.2<0.451<1.12<9.2<0.451
% patients41%55%54%35%57%53%
Se [95% CI]/Sp[95% CI]0.86 [0.81;0.91]/0.56 [0.51;0.61]0.76 [0.70;0.82]/0.73 [0.68;0.78]0.82 [0.77;0.87]/0.75 [0.71;0.79]0.88 [0.85;0.91]/0.49 [0.44;0.54]0.75 [0.72;0.78]/0.77 [0.74;0.80]0.83 [0.80;0.86]/0.75 [0.71;0.79]
NPV0.880.840.880.870.830.87
Indeterminate zone [85%Se; 90%Sp[
% patients31%20%16%32%20%18%
Rule in cut-off (≥90% Sp)≥1.81≥13.6≥0.679≥1.81≥13.6≥0.679
% patients28%25%30%33%23%29%
Se [95% CI]/Sp[95% CI]0.50 [0.43;0.57]/0.84 [0.80;0.88]0.53 [0.46;0.60]/0.91 [0.88;0.94]0.61 [0.54;0.68]/0.87 [0.34;0.90]0.56 [0.51;0.61]/0.82 [0.78;0.86]0.48 [0.42;0.54]/0.92 [0.89;0.95]0.61 [0.55;0.67]/0.90 [0.87;0.93]
PPV0.640.780.730.650.790.79
AUROC=Area under the receiver operating characteristic curve, CI=Confidence interval, FIB-4=Fibrosis-4 index, Agile=Agile 3+ and Agile 4, LSM=Liver stiffness measurement, NAFLD=Nonalcoholic fatty liver disease, NASH CRN=Nonalcoholic Steatohepatitis Clinical Research Network, NPV=Negative predictive value, PPV=Positive predictive value, Se=Sensitivity, Sp=Specificity.
a Analysis performed by Pr Boursier and his team.
Diagnostic performances of Agile 4 in the TS and the internal VS in terms of sensitivity, specificity, adjusted PPV and NPV are represented in Fig. 2 (A) and Supplementary Fig. 7 respectively, for all possible cut-off values.

#### Dual cut-off approach

To minimize the number of patients in the indeterminate zone and to maximize the PPV in the rule-in zone, it was decided to select a rule-out cut-off that achieved sensitivity of ≥85% and a rule-in cut-off that achieved specificity of ≥95% for the diagnosis of cirrhosis (Table 2). The cut-off values of Agile 4 were 0.251 and 0.565 for rule-out and rule-in, respectively, with characteristics detailed in Table 2, Table 3 and Fig. 3.
Using this approach, no more than 17% of cases had an indeterminate result in the TS and the internal VS. In the TS and the internal VS, an improvement of the proportion of patients correctly/accurately ruled out with high specificities compared to FIB-4 and LSM was observed. Furthermore, the same observation was made in both external VS.
Moreover, the reduction in the numbers of cases with indeterminate results with Agile 4 in all datasets was substantial compared to those achieved using FIB-4 or LSM.
Finally, an improvement in the identification of patients with cirrhosis using Agile 4 was observed. The sensitivity in the rule-in zone was higher than that achieved with FIB-4 or LSM in the TS, the internal VS and the NASH CRN cohort. Moreover, the PPV for Agile 4 increased in all datasets.
Results of the performances of high specificity (99%) cut-off values for the diagnosis of cirrhosis are presented in Supplementary Materials & Methods (Supplementary Fig. 14, Supplementary Table 12).

### Agile 3+

#### Score construction

The parameters contributing to the prediction of AF were quite similar to those of Agile 4 as LSM, AAR, PLT, sex and diabetes status remained significant in Agile 3+ as well (details on the predictors selected at each stage of the score construction presented in Supplementary Tables 9 and 11). Furthermore, age was also singled out during the construction of Agile 3+.
The equation of Agile 3+ was:
$Agile3+=elogit(pF≥3)1+elogit(pF≥3)$

with $logit(pF≥3)=−3.92368+2.29714×ln(LSM)−0.00902×PLT−0.98633×AAR−1+1.08636×Diabetesstatus−0.38581×Sex+0.03018×Age$
As with Agile 4, Agile 3+ is a predicted probability from the logistic regression model, which is bounded between 0 and 1 and can be interpreted in a probabilistic manner.

#### Overall diagnostic performances

As for Agile 4, for all datasets, the calibration lines of Agile 3+ (Supplementary Figs. 8–10) were also close to the ideal calibration, which indicates an excellent goodness of fit of predicted probabilities of AF. Excellent discrimination of Agile 3+ was observed with AUROCs around 0.9, significantly different from those of LSM and FIB-4 in the TS, the internal VS and the NASH CRN cohort (Table 2, Table 3 and Supplementary Fig. 11). Furthermore, decision curves (Fig. 1 (C)-(D) and Supplementary Fig. 12) suggest that Agile 3+ is a better option compared to FIB-4, LSM alone (except for the French NAFLD cohort) or even treating all patients as having AF since it has the highest net benefit across the range of threshold probabilities [0.0; 0.5]. On the French NAFLD cohort (Fig. 1 (D)), Agile 3+ has the highest net benefit across the range of threshold probabilities between 0.0 and around 0.2 and between about 0.3 and 0.5. For the range between 0.2 and 0.3, Agile 3+ and LSM have similar net benefit but remain higher than that of FIB-4.
Diagnostic performance of Agile 3+ in the TS and the internal VS in terms of sensitivity, specificity, adjusted PPV and NPV are represented in Fig. 2 (B) and Supplementary Fig. 13 respectively, for all possible cut-off values.

#### Dual cut-off approach

It was decided to select a rule-out cut-off that achieved sensitivity of ≥85% and a rule-in cut-off that achieved specificity of ≥90% for the diagnosis of F≥3 (Table 2). Thus, the cut-off values of Agile 3+ were 0.451 and 0.679 for rule-out and rule-in, respectively, characteristics detailed in Table 2, Table 3 and Fig. 4.
No more than 18% of cases had indeterminate results in all datasets with Agile 3+.
Moreover, an improvement of the proportion of patients correctly/accurately ruled out with Agile 3+ with high specificities compared to FIB-4 and LSM was observed in the TS and in the internal VS. However, in both external VS, this increase was confirmed only when comparing Agile 3+ to FIB-4.
Finally, a small improvement of the identification of patients with AF was observed. The sensitivity in the rule-in zone was indeed higher than those of FIB-4 and LSM in all datasets and the PPV slightly increased in the TS and the internal VS. Nevertheless, in both external VS, even if an improvement of the PPV compared to FIB-4 was observed, this result was not maintained comparing Agile 3+ to LSM with PPV of LSM higher or equal to that of Agile 3+.
Results of the performances of FIB-4 and LSM using published cut-off values
• McPherson S.
• Hardy T.
• Dufour J.F.
• Petta S.
• Romero-Gomez M.
• Allison M.
• et al.
Age as a Confounding Factor for the Accurate Non-Invasive Diagnosis of Advanced NAFLD Fibrosis.
,
• Papatheodoridi M.
• Hiriart J.B.
• Lupsor-Platon M.
• Bronte F.
• Boursier J.
• Elshaarawy O.
• et al.
Refining the Baveno VI elastography criteria for the definition of compensated advanced chronic liver disease.
vs Agile 3+ for the diagnosis of AF are presented in Supplementary Materials & Methods (Supplementary Table 13).

### Sensitivity analyses

Results of sensitivity analyses are presented in Supplementary Materials & methods (Supplementary results, Supplementary Tables 14–17). The AUROCs remain more than 0.80 regardless of whether patients were obese or non-obese, whether they had steatosis or not, whether they were diabetic or nondiabetic and whether they were measured with M or XL probe. This demonstrated that these factors do not impact the performances of the models. Finally, impact of the prevalence of AF and cirrhosis on the PPV and NPV for the optimal rule-out and rule-in cut-offs are presented on Fig. 5, for Agile 4 and Agile 3+ respectively. With increasing prevalence of AF and cirrhosis, the PPV tended to increase to a greater extent than the decrease in NPV.

## Discussion

Identifying patients with cirrhosis is of great importance in order to commence periodic surveillance for HCC and oesophageal varices. Moreover, the identification of patients with AF is also important as these patients are at risk of disease progression towards clinical outcomes. They could benefit in priority from existing intervention and pharmacological therapies for NAFLD once available.
In this study, we propose two new FibroScan based scores, Agile 4 and Agile 3+, combining LSM with routine biomarkers to identify the presence of cirrhosis or AF correspondingly, in secondary/tertiary care liver clinics, in patients who would have received a LB for evaluation of NAFLD. By construction, these scores are the probabilities of cirrhosis (Agile 4) and AF (Agile 3+) and can therefore be interpreted as such.
As specified previously, the objectives of this work were to propose new scores and associated sets of rule-out/rule-in cut-offs selected to decrease the number of patients with indeterminate results (in-between the two cut-off values) compared to LSM and FIB-4 and to increase the PPV in the rule-in zone without substantially degrading sensitivity. To do so we tested several levels of sensitivities and specificities. The optimal combinations were rule-out with 85% sensitivity and rule-in with 95% specificity for Agile 4 to predict cirrhosis and rule-out with 85% sensitivity and rule-in with 90% specificity for Agile 3+ to predict AF. Once set on the TS, those same cut-off values were tested in the different validation sets and their respective performances confirmed. Nevertheless, performances of both scores using classical rule-out and rule-in cut-off values with 90% sensitivity and 90% specificity, respectively, are presented in the Supplementary Table 18.
This study has the following strengths. Firstly, the scores were derived on a large cohort of 1434 patients recruited in secondary/tertiary care liver clinics from North America, Europe and Asia. Secondly, the study was able to validate the scores in three other large cohorts: (i) an internal VS made from the remaining third of the initial global pool of patients not used for the TS, (ii) a large subset of patients from the NAFLD Adult Database 2 of the NASH CRN conducted in 8 expert centers in the USA and (iii) a large cohort of patients from three expert centers in France. This contributed to limit the overfitting. Moreover, the shrinkage factor used to determine the sample size was a priori defined at 0.9 (close to 1), high enough to minimize the potential model overfitting. Thirdly, these scores are developed by using solely widely available routine biomarkers. By doing so and making the score formula public and available through an app and a website, we aim at making the scores easily and readily accessible without additional cost at the same time as LSM by VCTE is obtained. Nevertheless, we also compared in all the datasets, the performances of two scenarii: (i) Agile scores done on all patients (ii) patients first undergo LSM by VCTE then Agile is performed only on patients who are either ruled-in or indeterminate with LSM (Supplementary Figs. 15–17). The results show that, compared to Agile scores alone, sequential use of LSM followed by Agile scores, slightly increase the number of patients ruled out, slightly decrease the number of cases with indeterminate results but improves PPV.
However, there are some limitations to this study. LSM by VCTE, for which access is limited across the globe, is needed for the computation of the scores. However, these scores are intended to be used in secondary/tertiary care liver clinics where most of the current 7,800+ FibroScan global installed base is. Moreover, the cost of the procedure is covered by public and/or private health care insurance in many countries. Another potential limitation could be the higher prevalence of AF and cirrhosis in the TS and the internal VS compared to the one expected in the intended use population and observed in the external VS. First, to avoid optimistic bias, predictive values reported for the TS and the internal VS were adjusted to the prevalence of the context of use population (namely, the prevalence of the external validation sets). Second, the impact of lower prevalence of the target conditions on the predictive values for the selected cut-off values (Fig. 5) was evaluated. With increasing prevalence of AF and cirrhosis, the PPV tended to increase to a greater extent than the decrease in NPV. This means that the cut-off values proposed here would have to be adjusted and the scores need further evaluation in context of use with lower target prevalence. Notwithstanding, it should be noted that developing the score on training set with a high prevalence of the target conditions allowed capturing more variability. Another limitation, could be the selection and misclassification biases associated with the use of patients who underwent a liver biopsy. Therefore, the next step, to further assess the added value of these scores independent from liver biopsy, would be to investigate their capacity to predict clinical outcome.
Another limitation is the use of LB as reference standard. First, it is now well recognized that there is a significant intra and inter observer variability for the assessment of a fibrosis lesion. One could argue that all LB from the different cohorts should have been assessed centrally by several pathologists with a consensus. However, we believe that by using fibrosis stage assessed by different pathologist(s) expert in the field of chronic liver diseases, the resulting scores should be more robust and independent of the pathologist reading and thus more translatable to real world practice. Moreover, the inter-observer agreement for fibrosis stage has been shown to be excellent
• Kleiner D.E.
• Brunt E.M.
• Van Natta M.
• Behling C.
• Contos M.J.
• Cummings O.W.
• et al.
Design and validation of a histological scoring system for nonalcoholic fatty liver disease.
,
• Bedossa P.
Utility and appropriateness of the fatty liver inhibition of progression (FLIP) algorithm and steatosis, activity, and fibrosis (SAF) score in the evaluation of biopsies of nonalcoholic fatty liver disease.
. Second, biomarkers used in the scores may have been used to decide on performing the biopsy. However, since the scores are built using routine biomarkers, it is difficult to avoid this selection bias, and the fact that the criteria used by the investigators to request a LB were not homogeneous among the different cohorts may have decreased this potential bias. Third, no criteria concerning the quality of LB was required to be included in this study. However, the comparisons of AUROCs of Agile 3+ and Agile 4 for patients with LB length > 15 mm vs LB length ≤ 15 mm presented in Supplementary Table 19 demonstrate that performances were not significantly different between subgroups. Together, these data demonstrate that the performance metrics of the scores were not adversely impacted by the biopsy length and support the robustness of the models.
Finally, it has been shown, for existing scores, that the use of age as one of the marker, as it is the case for Agile 3+, may warrant the use of age-adjusted cut-off values
• McPherson S.
• Hardy T.
• Dufour J.F.
• Petta S.
• Romero-Gomez M.
• Allison M.
• et al.
Age as a Confounding Factor for the Accurate Non-Invasive Diagnosis of Advanced NAFLD Fibrosis.
. Similarly, use of presence of co-morbidities such as presence of diabetes can impact the performance of the scores when used in populations with lower or higher prevalence of diabetes (such as in endocrinology)
• Bril F.
• McPhaul M.J.
• Caulfield M.P.
• Clark V.C.
• Soldevilla-Pico C.
• Firpi-Morell R.J.
• et al.
Performance of plasma biomarkers and diagnostic panels for nonalcoholic steatohepatitis and advanced fibrosis in patients with type 2 diabetes.
. Therefore, these points need to be further investigated.
In conclusion, by combining simple clinical parameters together with routine laboratory biomarkers and LSM by VCTE, it is possible to improve the PPV and reduce the number of cases with indeterminate results for the identification of cirrhosis and AF in patients with NAFLD in secondary/tertiary care liver clinics where the prevalence is at least 13% and 37%, respectively. The use of these non-invasive scores would reduce the need for confirmatory LB, thus improve patient care and reduce associated cost. Agile 4 could also be of interest to adjust pharmacological treatment regimen in case of presence of cirrhosis. The potential serial use of Agile 3+ and Agile 4 to monitor disease progression or their use to predict clinical outcome needs to be investigated.

## Acknowledgements

The sponsor of the study (Echosens SA, Paris, France) had a role in study design, data collection, data analysis, data interpretation and writing of the report. The corresponding author and the funder had full access to all data in the study and had full responsibility for the decision to submit the publication.

## References

• Younossi Z.M.
• Stepanova M.
• Ong J.
• Trimble G.
• AlQahtani S.
• Younossi I.
• et al.
Nonalcoholic Steatohepatitis Is the Most Rapidly Increasing Indication for Liver Transplantation in the United States.
Clin Gastroenterol Hepatol. 2021; 19 (e5): 580-589https://doi.org/10.1016/J.CGH.2020.05.064
• Swain M.G.
• Ramji A.
• Patel K.
• Sebastiani G.
• Shaheen A.A.
• Tam E.
• et al.
Burden of nonalcoholic fatty liver disease in Canada, 2019-2030: a modelling study.
C Open. 2020; 8: E429-E436https://doi.org/10.9778/CMAJO.20190212
• Sanyal A.J.
• Van Natta M.L.
• Clark J.
• Neuschwander-Tetri B.A.
• Diehl A.
• Dasarathy S.
• et al.
Prospective Study of Outcomes in Adults with Nonalcoholic Fatty Liver Disease.
N Engl J Med. 2021; 385: 1559-1569https://doi.org/10.1056/NEJMOA2029349
• Rockey D.C.
• Caldwell S.H.
• Goodman Z.D.
• Nelson R.C.
• Smith A.D.
Liver biopsy.
Hepatology. 2009; 49: 1017-1044https://doi.org/10.1002/HEP.22742
• Davison B.A.
• Harrison S.A.
• Cotter G.
• Alkhouri N.
• Sanyal A.
• Edwards C.
• et al.
Suboptimal reliability of liver biopsy evaluation has implications for randomized clinical trials.
J Hepatol. 2020; 73: 1322-1332https://doi.org/10.1016/j.jhep.2020.06.025
• Moons K.G.M.
• Altman D.G.
• Reitsma J.B.
• Ioannidis J.P.A.
• Steyerberg E.W.
• et al.
Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): Explanation and elaboration.
Ann Intern Med. 2015; 162 (W1–73)https://doi.org/10.7326/M14-0698
• Afdhal N.H.
• Bacon B.R.
• Patel K.
• Lawitz E.J.
• Gordon S.C.
• Nelson D.R.
• et al.
Accuracy of fibroscan, compared with histology, in analysis of liver fibrosis in patients with hepatitis B or C: A united states multicenter study.
Clin Gastroenterol Hepatol. 2015; 13 (e3): 772-779https://doi.org/10.1016/j.cgh.2014.12.014
• Wong V.W.S.
• Irles M.
• Wong G.L.H.
• Shili S.
• Chan A.W.H.
• Merrouche W.
• et al.
Unified interpretation of liver stiffness measurement by M and XL probes in non-alcoholic fatty liver disease.
Gut. 2019; 68: 2057-2064https://doi.org/10.1136/gutjnl-2018-317334
• Kleiner D.E.
• Brunt E.M.
• Van Natta M.
• Behling C.
• Contos M.J.
• Cummings O.W.
• et al.
Design and validation of a histological scoring system for nonalcoholic fatty liver disease.
Hepatology. 2005; 41: 1313-1321https://doi.org/10.1002/hep.20701
• Riley R.D.
• Ensor J.
• Snell K.I.E.
• Harrell F.E.
• Martin G.P.
• Reitsma J.B.
• et al.
Calculating the sample size required for developing a clinical prediction model.
BMJ. 2020; 368https://doi.org/10.1136/bmj.m441
1. Steyerberg EW. Clinical Prediction Models. Cham: Springer International Publishing; 2019. https://doi.org/10.1007/978-3-030-16399-0.

• Buse A.
The Likelihood Ratio, Wald, and Lagrange Multiplier Tests: An Expository Note.
Am Stat. 1982; 36: 153https://doi.org/10.2307/2683166
• Sauerbrei W.
• Royston P.
Building Multivariable Prognostic and Diagnostic Models: Transformation of the Predictors by Using Fractional Polynomials.
J R Stat Soc Ser A (Statistics Soc. 1999; 162: 71-94
• DeLong E.R.
• DeLong D.M.
• Clarke-Pearson D.L.
Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach.
Biometrics. 1988; 44: 837https://doi.org/10.2307/2531595
• Vickers A.J.
• Elkin E.B.
Decision curve analysis: a novel method for evaluating prediction models.
Med Decis Making. 2006; 26: 565https://doi.org/10.1177/0272989X06295361
• Majumdar A.
• Campos S.
• Gurusamy K.
• Pinzani M.
• Tsochatzis E.A.
Defining the Minimum Acceptable Diagnostic Accuracy of Noninvasive Fibrosis Testing in Cirrhosis: A Decision Analytic Modeling Study.
Hepatology. 2020; 71: 627-642https://doi.org/10.1002/HEP.30846
• Vickers A.J.
• Van Calster B.
• Steyerberg E.W.
Net benefit approaches to the evaluation of prediction models, molecular markers, and diagnostic tests.
BMJ. 2016; 352https://doi.org/10.1136/BMJ.I6
• McPherson S.
• Hardy T.
• Dufour J.F.
• Petta S.
• Romero-Gomez M.
• Allison M.
• et al.
Age as a Confounding Factor for the Accurate Non-Invasive Diagnosis of Advanced NAFLD Fibrosis.
Am J Gastroenterol. 2017; 112: 740-751https://doi.org/10.1038/ajg.2016.453
• Papatheodoridi M.
• Hiriart J.B.
• Lupsor-Platon M.
• Bronte F.
• Boursier J.
• Elshaarawy O.
• et al.
Refining the Baveno VI elastography criteria for the definition of compensated advanced chronic liver disease.
J Hepatol. 2020; https://doi.org/10.1016/j.jhep.2020.11.050
2. R: The R Project for Statistical Computing n.d. https://www.r-project.org/(accessed September 30, 2020).

• Turck N.
• Vutskits L.
• Sanchez-Pena P.
• Robin X.
• Hainard A.
• Gex-Fabry M.
• et al.
pROC: an open-source package for R and S+ to analyze and compare ROC curves.
BMC Bioinformatics. 2011; 8: 12-77
• Friedman J.
• Hastie T.
• Tibshirani R.
Regularization paths for generalized linear models via coordinate descent.
J Stat Softw. 2010; 33: 1-22https://doi.org/10.18637/jss.v033.i01
3. Gareth Ambler AB. mfp: Multivariable Fractional Polynomials 2021.

• Siddiqui M.S.
• Vuppalanchi R.
• Van Natta M.L.
• Hallinan E.
• Kowdley K.V.
• Abdelmalek M.
• et al.
Vibration-Controlled Transient Elastography to Assess Fibrosis and Steatosis in Patients With Nonalcoholic Fatty Liver Disease.
Clin Gastroenterol Hepatol. 2019; 17 (e2): 156-163https://doi.org/10.1016/j.cgh.2018.04.043
• Eddowes P.J.
• Sasso M.
• Allison M.
• Tsochatzis E.
• Anstee Q.M.
• Sheridan D.
• et al.
Accuracy of FibroScan Controlled Attenuation Parameter and Liver Stiffness Measurement in Assessing Steatosis and Fibrosis in Patients With Nonalcoholic Fatty Liver Disease.
Gastroenterology. 2019; 156: 1717-1730https://doi.org/10.1053/j.gastro.2019.01.042
• Wong V.W.-S.
• Vergniol J.
• Wong G.L.-H.
• Foucher J.
• Chan H.L.-Y.
• Le Bail B.
• et al.
Diagnosis of fibrosis and cirrhosis using liver stiffness measurement in nonalcoholic fatty liver disease.
Hepatology. 2010; 51: 454-462https://doi.org/10.1002/hep.23312
• Bedossa P.
Utility and appropriateness of the fatty liver inhibition of progression (FLIP) algorithm and steatosis, activity, and fibrosis (SAF) score in the evaluation of biopsies of nonalcoholic fatty liver disease.
Hepatology. 2014; 60: 565-575https://doi.org/10.1002/HEP.27173
• Bril F.
• McPhaul M.J.
• Caulfield M.P.
• Clark V.C.
• Soldevilla-Pico C.
• Firpi-Morell R.J.
• et al.
Performance of plasma biomarkers and diagnostic panels for nonalcoholic steatohepatitis and advanced fibrosis in patients with type 2 diabetes.
Diabetes Care. 2020; 43: 290-297https://doi.org/10.2337/dc19-1071