If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Translational & Clinical Research Institute, The Medical School, Newcastle University, 4th Floor, William Leech Building, Framlington Place, Newcastle upon Tyne, NE2 4HH, United Kingdom. Tel.: + 44 (0) 191 208 7012
Translational & Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, United KingdomNewcastle NIHR Biomedical Research Centre, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, United Kingdom
Department of Pathophysiology and Transplantation, University of Milan, Translational Medicine - Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milan, Italy
Translational & Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, United KingdomDepartment of Pathology, Aretaieio Hospital, National & Kapodistrian University of Athens, Greece
Translational & Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, United KingdomFaculty of Health and Medical Sciences, The University of Adelaide, Adelaide, Australia
Liver Unit, Department of Medicine, Cambridge Biomedical Research Centre, Cambridge University NHS Foundation Trust, United KingdomDepartment of Biochemistry and Wellcome Trust/MRC Institute of Metabolic Science, MRC Metabolic Diseases Unit, Metabolic Research Laboratories, University of Cambridge, UK
Division of Gastroenterology and Center for Autoimmune Liver Diseases, Department of Medicine and Surgery, University of Milano - Bicocca, Monza, ItalyEuropean Reference Network on Hepatological Diseases (ERN RARE-LIVER), San Gerardo Hospital, Monza, Italy
Department of Pathophysiology and Transplantation, University of Milan, Translational Medicine - Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milan, Italy
Sezione di Gastroenterologia, Dipartimento Promozione della Salute, Materno-Infantile, di Medicina Interna e Specialistica di Eccellenza “G. D'Alessandro”, Università di Palermo, Palermo, Italy
Sorbonne Université, Assistance Publique-Hôpitaux de Paris, Hôpital Pitié-Salpêtrière, Institute of Cardiometabolism and Nutrition (ICAN), Paris, France
Department of Pathophysiology and Transplantation, University of Milan, Translational Medicine - Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milan, Italy
Corresponding authors. Addresses: Translational & Clinical Research Institute, The Medical School, Newcastle University, 4th Floor, William Leech Building, Framlington Place, Newcastle upon Tyne, NE2 4HH, United Kingdom. Tel.: + 44 (0) 191 208 7031
Genome-wide association study involved 1,483 biopsied NAFLD cases and 17,781 controls.
•
Main analysis shows genome-wide significance for PNPLA3, TM6SF2, HSD17B13 and GCKR.
•
Sub-analyses show significance near LEPR for NASH and near PYGO1 for steatosis.
•
Except for GCKR, the genome-wide significant signals were replicated.
Background & Aims
Genetic factors associated with non-alcoholic fatty liver disease (NAFLD) remain incompletely understood. To date, most genome-wide association studies (GWASs) have adopted radiologically assessed hepatic triglyceride content as the reference phenotype and so cannot address steatohepatitis or fibrosis. We describe a GWAS encompassing the full spectrum of histologically characterised NAFLD.
Methods
The GWAS involved 1,483 European NAFLD cases and 17,781 genetically matched controls. A replication cohort of 559 NAFLD cases and 945 controls was genotyped to confirm signals showing genome-wide or close to genome-wide significance.
Results
Case-control analysis identified signals showing p values ≤5 × 10−8 at 4 locations (chromosome [chr] 2 GCKR/C2ORF16; chr4 HSD17B13; chr19 TM6SF2; chr22 PNPLA3) together with 2 other signals with p <1 × 10−7 (chr1 near LEPR and chr8 near IDO2/TC1). Case-only analysis of quantitative traits showed that the PNPLA3 signal (rs738409) had genome-wide significance for steatosis, fibrosis and NAFLD activity score and a new signal (PYGO1 rs62021874) had close to genome-wide significance for steatosis (p = 8.2 × 10−8). Subgroup case-control analysis for NASH confirmed the PNPLA3 signal. The chr1 LEPR single nucleotide polymorphism also showed genome-wide significance for this phenotype. Considering the subgroup with advanced fibrosis (≥F3), the signals on chr2, chr19 and chr22 maintained their genome-wide significance. Except for GCKR/C2ORF16, the genome-wide significance signals were replicated.
Conclusions
This study confirms PNPLA3 as a risk factor for the full histological spectrum of NAFLD at genome-wide significance levels, with important contributions from TM6SF2 and HSD17B13. PYGO1 is a novel steatosis modifier, suggesting that Wnt signalling pathways may be relevant in NAFLD pathogenesis.
Lay summary
Non-alcoholic fatty liver disease is a common disease where excessive fat accumulates in the liver and may result in cirrhosis. To understand who is at risk of developing this disease and suffering liver damage, we undertook a genetic study to compare the genetic profiles of people suffering from fatty liver disease with genetic profiles seen in the general population. We found that particular sequences in 4 different areas of the human genome were seen at different frequencies in the fatty liver disease cases. These sequences may help predict an individual's risk of developing advanced disease. Some genes where these sequences are located may also be good targets for future drug treatments.
It has come to our attention that there is a typographical error in Table 5 of our manuscript. In line 6 of column 2, "rs11852624" should read "rs11858624”. We apologise for any inconvenience caused.
In this article we used data from the Understanding Society study as part of our control group. While we did explain that the data was provided by Understanding Society in the paper, we did not include the statement they required for use of these data which is the following:
Non-alcoholic fatty liver disease (NAFLD) represents a spectrum of progressive liver disease characterised by increased hepatic triglyceride content (HTGC) in the absence of excess alcohol consumption.
NAFLD encompasses steatosis (non-alcoholic fatty liver [NAFL]), steatohepatitis (non-alcoholic steatohepatitis [NASH]), fibrosis and ultimately cirrhosis. It is strongly associated with features of the metabolic syndrome (obesity, type 2 diabetes mellitus [T2DM] and dyslipidaemia).
Although common, affecting approximately 25% of the global adult population, only a minority of patients with NAFL develop NASH, progress to significant fibrosis or experience associated morbidity.
NAFLD is best considered a complex trait where disease phenotype results from environmental exposures acting on a susceptible polygenic background comprising multiple independent modifiers.
Genome-wide association studies (GWASs) have contributed greatly to our understanding of the genetic contribution to NAFLD pathogenesis and variability of prognosis.
and more recently, a non-synonymous SNP in TM6SF2 (transmembrane 6 superfamily member 2) (rs58542926), originally ascribed to the neighbouring NCAN gene,
Both genetic associations have been replicated in further studies where they have been associated not only with steatosis, but also with clinically relevant factors including grade of steatohepatitis and stage of hepatic fibrosis/cirrhosis
Homozygosity for the patatin-like phospholipase-3/adiponutrin I148M polymorphism influences liver fibrosis in patients with nonalcoholic fatty liver disease.
A number of other associations, with LYPLAL1, GCKR, and PPP1R3B, have been reported by GWAS comprising relatively few histologically characterised cases and are currently less robustly replicated.
in a general patient population and then demonstrated that this polymorphism was associated with NAFLD. Two further studies broadly confirmed this association.
). One GWAS has assessed a large number of histologically characterised patients, reporting associations with both PNPLA3 and with chromosome 19 close to TM6SF2.
These patients, however, were recruited from bariatric surgery programmes with dietary restrictions prior to surgery and wedge biopsy collection which may affect liver histology; in addition such patients tend to be younger and have a higher average BMI than NAFLD cases more generally.
The current study aims to identify genetic modifiers of steatohepatitis and fibrosis, attaining genome-wide levels of statistical significance by using a large internationally derived cohort of patients (with histologically characterised NAFLD and representing all stages of the disease). We now report the largest histology-based NAFLD GWAS to date in a cohort of 1,483 European patients exhibiting the full spectrum of biopsy-proven NAFLD.
Materials and methods
NAFLD cases
For the main GWAS study, patients were recruited from clinics at several leading European tertiary liver centres (see supplementary methods). Additional cases for replication were recruited at Foundation IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milan, Italy. The study had the necessary ethical approvals from the relevant national/institutional review boards (see supplementary methods) and all participants provided informed consent. All cases were unrelated patients that had undergone a liver biopsy as part of the routine diagnostic workup for presumed NAFLD having originally been identified due to abnormal biochemical tests (ALT and/or gamma-glutamyltransferase) and/or an ultrasonographically detected bright liver, associated with features of the metabolic syndrome; or having abnormal biochemical tests (ALT and/or gamma-glutamyltransferase) and macroscopic appearances of a steatotic liver at the time of bariatric surgery. Full details of inclusion/exclusion criteria are provided in the supplementary methods.
Controls
We used general population samples with existing genome-wide genotype data as study controls. For the GWAS, we selected European ancestry controls (n = 17,781) from multiple sources as described in the supplementary methods. To replicate GWAS associations, we used an Italian control cohort (n = 945) consisting of controls described previously
with some newly collected individuals. Any that were found to match the Hypergenes controls already used in our discovery GWAS were excluded.
Histology
Liver biopsy specimens (at least 1.6 cm length and ~1 mm diameter) were formalin-fixed and paraffin-embedded. Tissue sections (5 μm-thick) were routinely stained with haematoxylin and eosin and trichrome stain to visualise collagen. All cases were recruited at tertiary centres where liver biopsies were routinely assessed according to accepted criteria by experienced liver pathologists and scored using the well validated NIDDK NASH-CRN system.
To ensure optimum data quality, biopsies were retrieved from archival storage where possible (78% of cases) and scored centrally by an expert liver pathologist from the FLIP/EPoS central pathology team (DT, ADB, PB), as described in detail previously.
Utility and appropriateness of the fatty liver inhibition of progression (FLIP) algorithm and steatosis, activity, and fibrosis (SAF) score in the evaluation of biopsies of nonalcoholic fatty liver disease.
Where archival samples were unavailable for central reading, the local liver pathologist's scores were used. To maximise insights into the specific pathophysiological processes that occur as NAFLD progresses, 6 phenotypes of interest were studied: degree of steatosis (S0-3); degree of ballooning (B0-2); degree of lobular inflammation (I0-3); severity of NASH activity (calculated as ‘disease activity’ = hepatocyte ballooning (B0-2) + lobular inflammation (I0-3) and also an overall NAFLD activity score ‘NAS’ combining all 3 parameters (NAS0-8)); and stage of fibrosis (F0-4).
Genotyping
DNA was prepared from blood samples collected with EDTA as described previously.
GWAS genotyping was carried out in 2 phases. For phase I, genotyping was performed initially using the Illumina OmniExpress BeadChip by Edinburgh Clinical Research Centre. To obtain data for additional exomic SNPs, further genotyping of these samples was performed using the Illumina HumanCoreExome BeadChip (Aros, Denmark). Genome-wide genotyping of the phase II cases was performed using the Illumina OmniExpressExome BeadChip by the Edinburgh Clinical Research Centre. A total of 721,078 markers shared across the batches passed quality control (see supplementary methods). SNP imputation was performed as described in detail in the supplementary methods.
The top associated SNPs were further confirmed in replication cases using TaqMan® SNP genotyping assays (ThermoFisher Scientific, Waltham, MA) in accordance with the manufacturer's recommendations. If an assay could not be designed for the SNP showing the strongest signal for the region, a suitable proxy SNP was chosen (https://ldlink.nci.nih.gov/?tab = home).
RNA sequencing and in vitro studies
RNA sequencing
RNA sequencing data on samples from 206 liver biopsies from patients with NAFLD, as described elsewhere (Govaere et al., submitted), was used to further investigate the functional significance of HSD17B13 variants.
Bioluminescent retinol dehydrogenase assays for HSD17B13
Retinol (75 μM; Sigma-Aldrich, St. Louis, Missouri, USA) was incubated with recombinant HSD17B13 (TP313132; Origene, Maryland, USA) for 1 h at room temperature in the presence of 0.5 mM NAD in 200 mM Tris-HCl, pH7.5. As a control, the known HSD17B13 substrate β-estradiol (75 μM) was incubated in parallel assays. NADH production was measured by Bioluminescent NAD/NADH-GloTM Assay (Promega, Wisconsin, USA) according to manufacturer's guidelines.
Statistical analysis
We used principal component analysis (PCA) of the genome-wide genotype data to investigate the ancestry of the cases and controls; this showed the expected north/south variation commonly seen across Europe
but, importantly, suggested adequate matching between cases and controls (Fig S1A and Fig S1B). Case/control analysis and quantitative trait analysis of GWAS data was performed as described in detail in the supplementary methods, using a linear mixed modelling approach with the incorporation of the top 5 principal components as covariates to adjust for any population stratification. Examination of the resulting genome-wide QQ plots and genomic control inflation factors (λ)
(see Results) indicated that this adjustment adequately corrected for any population differences.
Significance of findings in the replication cohort was assessed by calculation of odds ratios, 95% confidence intervals and p values by univariate analysis and multiple logistic regression using PLINK.
Clinical details of the NAFLD cases included in the main GWAS are summarised in Table 1. The replication cohort details are shown in Table S1. All cases in both cohorts were of white European ethnicity. The percentage with advanced fibrosis (stage F3 or F4) was similar in both cohorts (p >0.05) but other parameters including age, BMI, T2DM, sex and incidence of NASH were different.
The overall NAFLD case-control analysis is presented as a Manhattan plot (Fig 1). PCA scattergrams for cases and controls are shown in Fig S1 and the QQ plot of the association results in Fig S2. As summarised in Table 2, 4 different regions (on chromosomes 2, 4, 19 and 22) passed conventional genome-wide significance (p <5 × 10−8) with 2 other regions (on chromosomes 1 and 8) showing p values <1 × 10−7 (for LocusZoom plots see Fig S3). Data presented in Fig. 1 were obtained from imputation analysis. Primary case-control analysis without imputation showed similar signals in chromosomes 2, 4, 19 and 22 only but no additional signals at p <1 × 10−7 (Fig. S4 and Table S2). Correction of the imputed data for sex in addition to the first 5 principal components used in the main analyses did not result in large changes in p value (Table S3). Together, these results point to PNPLA3, TM6SF2, HSD17B13 and the GCKR/C2ORF16 region being the major risk factors for disease susceptibility with borderline signals for chromosome 1 near LEPR and for chromosome 8 adjacent to IDO2 and TC1(C8orf4). In view of the well-established strong association of PNPLA3 rs738409 with NAFLD, additional analysis using a model conditioning on this SNP was performed. This analysis gave broadly similar findings to those summarised in Table 2 with no new signals (data not shown).
Fig. 1Manhattan plot from imputed GWAS case-control analysis.
Included 1,483 NAFLD cases and 17,781 controls. Threshold for genome-wide significance was taken to be 5 × 10−8. The first 5 principal components were included as covariates. Genome-wide significant signals are indicated by blue arrows with those showing p in the range 1 × 10−7 to 5 × 10−8 shown by grey arrows. GWAS, genome-wide association study; NAFLD, non-alcoholic fatty liver disease. (This figure appears in color on the web.)
Denotes validated SNP following imputation. The first 5 principal components were included as covariates.
22
G
PNPLA3
1.45E−49
1.827 (1.687–1.979)
7,412,561 imputed SNPs included; total number of cases and controls = 19,264. ORs were obtained from logistic regression in PLINK and confidence intervals were calculated from back-transformation of FaST-LMM p-values and PLINK ORs.
OR, odds ratio; SNP, single nucleotide polymorphism.
∗ Denotes validated SNP following imputation. The first 5 principal components were included as covariates.
Case-only analyses assessing relevance of genotype to grade of steatosis (assessed as predefined ‘disease activity’ and ‘NAS’) and stage of fibrosis were also performed using the imputed data. Results of these analyses are shown in Fig. 2 with the most significant signals summarised in Table 3 (for QQ and LocusZoom plots see Figs. S5 and S6). The primary data without imputation are summarised in Fig. S7 and Table S4. For steatosis, NAS and fibrosis as quantitative traits, signals with p <10−10 were detected for PNPLA3 rs738409 and other SNPs in this region of chromosome 22. For steatosis, a signal with p = 8.2 × 10−8 on chromosome 15 (rs62021874 in PYGO1) was also detected (Table 3). This variant is in complete linkage disequilibrium with a missense variant rs11858624 which also showed a signal close to significance (p = 1.7 × 10−7). No signals reached conventional genome-wide significance (p <5 × 10−8) for disease activity score alone or when ballooning or inflammation were considered as individual traits (Fig. S8). The effect of correction of the imputed data for clinical covariates was also assessed for each trait (Table S5), giving results very similar to those obtained originally.
Fig. 2Manhattan plots from imputed GWAS analysis on the basis of quantitative traits.
Included 1,483 NAFLD cases. Threshold for genome-wide significance was taken to be 5 × 10−8 but signals showing p <1 × 10−7 are also indicated. Panel A shows data for steatosis, B for fibrosis, C for disease activity score and D for NAS score. The first 5 principal components were included as covariates. Genome-wide significant signals are indicated by blue arrows with those showing p in the range 1 × 10−7 to 5 × 10−8 shown by grey arrows. GWAS, genome-wide association study; NAFLD, non-alcoholic fatty liver disease; NAS, NAFLD activity score. (This figure appears in color on the web.)
Results for 7,900,223 imputed SNPs. First 5 principal components were included as covariates. ORs were obtained from logistic regression in PLINK and confidence intervals were calculated from back-transformation of FaST-LMM p-values and PLINK ORs.
To further assess the relevance of genotype to particular NAFLD phenotypes, the contribution to NAFLD progression of the 4 major genetic risk factors identified in the case-control GWAS was assessed by calculating a combined genetic risk score based on summing the allele count (with no weighting by effect size) for PNPLA3 rs738409, TM6SF2 rs58542926, GCKR rs1260326 and HSD17B13 rs9992651 and relating the resulting score to grade of steatosis, NAS and fibrosis stage (Fig. S9). Trend tests by linear regression showed that there was a statistically significant relationship between the value of the semi-quantitative steatosis/NAS/fibrosis scores and the value of the genetic risk score for all 3 phenotypes, with the most significant relationship (p = 4.68 × 10−13) detected for fibrosis stage (Fig S9). Those with a risk score of 2 (n = 216) had a mean fibrosis score of 1.27 (SE 0.08) compared with 1.94 (SE 0.09) for a risk score of 5 (n = 260).
Additional subgroup case-control analysis
Since both steatohepatitis and advanced fibrosis are clinically important phenotypes in NAFLD,
additional case-control analyses were undertaken including cases with NASH only (n = 836) and fibrosis stage F3 and F4 only (n = 386). The findings for both phenotypes are summarised in Fig. 3 and Table 4 (for QQ and LocusZoom plots see Fig. S10 and S11). For NASH, signals showing p values of <5 × 10−8 were detected for chromosome 1 (LEPR) and chromosome 22 (PNPLA3) (Table 4). For LEPR rs12077210, the p value of 4.4 × 10−9 was lower for NASH than for NAFLD overall (Table 2). A second novel chromosome 1 signal (rs80084600) with p = 7.1 × 10−8 located in an intergenic region downstream of phospholipase A2 group IVA (PLA2G4A) was also detected. The SNPs in chromosomes 2, 4 and 19 that were significant in the main case-control analysis showed p values in the region of 2 × 10−7 so came close to significance for NASH. For fibrosis stages F3 and F4, chromosome 2, 19 and 22 signals showing p values of <5 × 10−8 were detected but the signals from the main case-control analysis detected previously for chromosomes 1, 8 and 4 showed p values >1 × 10−7. For HSD17B13 rs9992651 (chromosome 4), the p value was 1.16 × 10−5.
Fig. 3Manhattan plots from imputed GWAS case-control analysis of NASH and severe fibrosis (F3/F4).
Threshold for genome-wide significance was taken to be 5 × 10−8. The first 5 principal components were included as covariates. Panel A. NASH analysis. 836 cases and 17,781 controls. Panel B. F3/F4 analysis. 386 cases and 17,781 controls. Genome-wide significant signals are indicated by blue arrows with those showing p in the range 1 × 10−7 to 5 × 10−8 shown by grey arrows. GWAS, genome-wide association study; NASH, non-alcoholic steatohepatitis. (This figure appears in color on the web.)
Table 4Summary of top findings from case-control analysis for NAFLD cases with NASH or with fibrosis scores F3 and F4 only.
SNP
Chromosome
Gene
p value (no clinical covariates)
OR (95% CI)
NASH
rs12077210
1
LEPR
4.42E−09
1.671 (1.390–2.008)
rs80084600
1
–
7.08E−08
1.977 (1.543–2.533)
rs1260326
2
GCKR
3.78E−07
1.302 (1.176–1.442)
rs9992651
4
HSD17B13
2.92E−07
0.718 (0.633–0.815)
rs13118664
4
HSD17B13
2.37E−07
0.716 (0.631–0.813)
rs58542926
19
TM6SF2
1.90E−07
1.606 (1.344–1.919)
rs8107974
19
SUGP1
1.36E−07
1.609 (1.348–1.920)
rs738409
22
PNPLA3
2.58E−44
2.053 (1.856–2.271)
Fibrosis F3/F4
rs1260326
2
GCKR
4.07E−10
1.678 (1.427–1.974)
rs56255430
19
–
2.11E−10
1.863 (1.538–2.257)
rs738409
22
PNPLA3
5.66E−31
2.374 (2.051–2.748)
N = 18,167 (Cases = 386, Controls = 17,781), covariate model includes first 5 principal components. ORs were obtained from logistic regression in PLINK and confidence intervals were calculated from back-transformation of FaST-LMM p-values and PLINK ORs.
Replication of GWAS signals and investigation of additional possible NAFLD risk factors
A replication cohort of 559 Italian NAFLD cases was assembled from a different centre to the discovery cohort. Allele frequencies for selected SNPs in these cases were compared with those for Italian controls. Findings for 8 separate loci giving signals with p <1 × 10−7 in either the main GWAS or the quantitative trait studies are summarised in Table 5. The PNPLA3, TM6SF2 and HSD17B13 signals seen in the main GWAS replicated (p <0.05) but we found only borderline effects or no significance for 4 other loci. However, the PYGO1 signal, which was associated with steatosis by quantitative trait analysis, showed a significant association in the analysis in the same protective direction as observed for steatosis. The GCKR/C2Orf16 signal did not replicate either in the main replication cohort (Table 5) or in a subgroup of replication cases (n = 134) with fibrosis stage 3 or 4. Due to the relatively low number of NASH cases in the replication cohort, we did not seek to replicate the novel rs80084600 signal seen for this phenotype. Multiple logistic regression analysis with adjustment for PNPLA3 rs738409 and TM6SF2 rs58542926 (Table 5) generated similar findings to the univariate analysis, apart from small decreases in p values for the HSD17B13 and PYGO1 signals.
Table 5Genotype frequencies in replication cohort.
Gene
SNP
Case frequency
Control frequency
Univariate analysis
Multiple logistic regression adjusting for PNPLA3 rs738409 and TM6SF2 rs58542926
Odds ratio
p value
Odds ratio
p value
LEPR
rs12077210
0.05877
0.05983
0.98 (0.71–1.35)
0.91
0.96 (0.69–1.34)
0.81
GCKR
rs1260326
0.5407
0.5305
1.04 (0.90–1.21)
0.59
1.08 (0.92–1.27)
0.36
C2ORF16
rs1919127
0.382
0.3566
1.12 (0.96–1.30)
0.16
1.1 (0.94–1.29)
0.25
HSD17B13
rs72613567
0.2101
0.2462
0.81 (0.68–0.97)
0.025
0.78 (0.64–0.95)
0.013
IDO2 /TC1(C8orf4)
rs79137099
0.03789
0.03891
0.97 (0.66–1.44)
0.89
1.05 (0.6–1.59)
0.83
PYGO1
rs11852624
0.05144
0.0709
0.71 (0.52–0.98)
0.035
0.67 (0.48–0.96)
0.027
TM6SF2
rs58542926
0.08813
0.05027
1.83 (1.36–2.45)
4.63E−05
n.a.
n.a.
PNPLA3
rs738409
0.4436
0.2754
2.10 (1.80–2.45)
6.60E−21
n.a.
n.a.
Significance of findings was assessed by calculation of odds ratios, 95% confidence intervals and p values by univariate analysis (chi-square test) and multiple logistic regression using PLINK.
Results for selected variants reported recently by others as risk factors for NAFLD but which had not shown p values of <1 × 10−7 in the current GWAS were also extracted from the main case-control analysis. Only rs2642438 in MARC1 (mitochondrial amidoxime-reducing component 1) and rs28929474 in AAT (alpha1-antitrypsin) showed p values <0.05 (Table S6). For rs2642438, the p value was 6 × 10−6 with a protective odds ratio of 0.816, in line with that reported previously.
EQTL analysis and studies on expression of GWAS signals in liver biopsies from different NAFLD stages
While the signals seen for NAFLD relating to PNPLA3, TM6SF2 and GCKR are already well-established risk factors for this disease from population studies
evidence for functional significance for the other signals is limited. The relationship of rs9992651 and rs72613567 in HSD17B13 with gene expression was evaluated by sequencing RNA samples from liver biopsies. Three different HSD17B13 transcripts were detected (Fig. S12), including a full-length transcript with all 7 exons, a variant with exon 2 deleted and a variant without exon 6. Based on genotype for rs9992651 from the RNA sequencing data, the variant without exon 6 was generally not detectable in homozygotes for the reference G allele but was expressed at a higher level in homozygotes for the minor A allele and also heterozygotes. The ability of recombinant HSD17B13 to oxidise retinol
Other loci showing associations in the case-control studies including rs12077210 in LEPR (intronic), rs139648192 on chromosome 8 and rs80084600 on chromosome 1 could not be investigated by RNA sequencing due to their locations. The borderline significant rs11858624 in PYGO1 (Table 3) is a missense variant (P299H). Analysis with data obtained from GTEx (https://gtexportal.org/home/) indicated no difference in RNA expression between rs11858624 homozygous wild-types and heterozygotes in liver tissue (Fig. S14).
Discussion
This study is the largest GWAS to date on histologically characterised NAFLD enrolled in a hepatology setting that addresses the full disease spectrum from steatosis to cirrhosis. This contrasts with the only previous GWAS involving more than 1,000 histologically characterised cases, which was in a predominantly female bariatric cohort with extreme obesity but relatively mild NAFLD.
The findings for GCKR are in line with several candidate gene studies on NAFLD however, this is the first GWAS study reporting this 4 gene combination as NAFLD risk modifiers.
HSD17B13 has been reported to be relevant to NAFLD with several variants associated with decreased risk.
The current study found a protective effect against NAFLD generally, with the strongest effect related to the SNPs rs9992651 and rs13118664. These SNPs are in non-coding regions of HSD17B13 but are in strong linkage disequilibrium with rs72613567, which is associated with a single base-pair insertion that has been suggested to be of functional significance in relation to RNA splicing.
The current study confirms that an HSD17B13 isoform lacking exon 6 is associated with rs9992651 and a protective effect against NAFLD; consistent with a report showing a similar splicing pattern with the SNPs rs6834314 and rs72613567
we also show the HSD17B13 gene product possesses retinol dehydrogenase activity. Retinol metabolism is a complex multistep process involving a number of different enzymes.
While it remains unclear whether loss of HSD17B13 retinol dehydrogenase activity can explain the protective effect of the variant, it is likely that enzyme activity in the reverse direction involving retinal reduction to retinol could also be impaired since these enzymes operate in both oxidising and reducing directions.
Thus, increased levels of retinal and the biologically active retinoic acid isomers could occur in those carrying HSD17B13 variants. This effect might protect against NAFLD development, in line with recent evidence that 13-cis and all-trans retinoic acid are found at significantly decreased levels in human livers with NAFLD.
A clear trend towards a protective effect against advanced hepatic fibrosis was observed, although this did not reach genome-wide significance levels (p value approx. 10−5). Given that the strength of association with NASH was stronger (p values approx. 2 × 10−7), it may be that the protective effect of HSD17B13 is more relevant to development of steatohepatitis than progression of fibrosis.
The GCKR signal in both the main GWAS and advanced fibrosis-only analysis identified rs1260326 as the most significant SNP within this region, with T-variant carriage increasing NAFLD risk. This common missense variant has been studied widely both as a risk factor for T2DM and for NAFLD. An upstream SNP, rs780094, in strong linkage disequilibrium with rs1260326, has also been shown to be a NAFLD risk factor in candidate gene studies.
The relationship between both SNPs and susceptibility to NAFLD and T2DM is complex. Rs1260326 is well established to have a protective effect against T2DM, probably due to the GCKR variant showing weaker interaction with glucokinase compared with the wild-type.
presumably due to increased glucose metabolism via glycolysis. The inability to replicate the GCKR association was slightly surprising but may reflect the overall lower severity of NAFLD in the replication cohort. There are a relatively large number of reports of a significant increased risk for GCKR variants in NAFLD generally, especially for paediatric cases.
A further interesting finding relates to a signal on chromosome 15 (rs11858624) that was close to genome-wide significance for steatosis and was validated in the replication study. The gene involved is PYGO1, which encodes a transcription factor that contributes to the Wnt signalling pathway.
The exact impact of PYGO1 in Wnt signalling remains unclear, though a homologue PYGO2 appears to contribute to several physiological pathways including increased adiposity and impaired glucose tolerance in mice lacking this protein.
Signals on chromosomes 1 and 8 were detected in the case-control analysis, however these just failed to meet genome-wide significance and did not replicate. The chromosome 1 SNP was genome-wide significant in the NASH-only case-control analysis and lies in the region encoding LEPROT and LEPR; both genes share the same promoter and first 2 exons but encode separate proteins. This association is notable given that db/db mice, carrying a spontaneous loss of function mutation in the OB-Rb leptin receptor, have been widely used to model NAFLD.
There are also some previous reports from candidate gene studies that LEPR variants are risk factors for NAFLD but the current variant lies considerably upstream of these previously studied variants.
The signal on chromosome 8 relates to an area between IDO2 and TC1. Of potential relevance to NAFLD, both genes have roles in modulating inflammation with IDO2 inducible by lipopolysaccharide and contributing to immune function
while TC1 modulates NF-κB signalling. Further investigation of these variants is needed. The subgroup analysis on NASH grade showed a second novel chromosome 1 signal separate from LEPR. The p value for NASH, though not genome-wide significant at 7 × 10−8, was considerably lower than that seen for this variant in the main case-control study (0.0049). The variant is in an intergenic region but is downstream of PLA2G4A, which shows elevated expression in adipose tissue in obesity and may contribute to T2DM susceptibility.
The most significant associations in this study were obtained for NAFLD in the binary case-control design. The quantitative trait analyses has shown a clear association for PNPLA3 rs738409 with steatosis, NAS score and fibrosis, which is generally in line with previous reports in NAFLD and alcohol-related liver disease.
However, there were no significant associations of any genotype with disease activity when considered separately from steatosis. The failure to see more specific associations for TM6SF2 and HSD17B13 with other histological traits similar to those reported previously in candidate gene studies may reflect the complex nature of the histological disease phenotype
and also limited statistical power. In contrast to quantification of HTGC by imaging techniques, which provides a highly reproducible quantitative measure of a single biochemical entity, the histological scoring systems used to evaluate steatohepatitis and fibrosis provide only non-linear, semi-quantitative or categorical assessments of disease and are subject to intra- and inter-observer variation. Indeed, clear diagnostic consensus regarding the presence or absence of steatohepatitis among pathologists is not always feasible.
Utility and appropriateness of the fatty liver inhibition of progression (FLIP) algorithm and steatosis, activity, and fibrosis (SAF) score in the evaluation of biopsies of nonalcoholic fatty liver disease.
Thus, the conduct of a histology-based GWAS, whilst addressing the most clinically relevant phenotypic characteristics, is technically more challenging. We have addressed this challenge by using expert liver pathologists to provide histological diagnosis and scoring. The reduced statistical power due to the limited number of cases in particular histological categories, may limit the number of variants that attain the genome-wide significance threshold to only the most strongly associated, such as the PNPLA3 variant. Despite these limitations, disease severity was correlated with genetic risk score based on the most significant case-control GWAS signals, statistically significant relationships for association of the risk score with increasing degree of steatosis, grade of steatohepatitis and fibrosis stage were found, which suggests that a risk score approach may be of value prognostically although further studies on this are needed.
Despite a fairly extensive supporting literature, we and others
Lack of evidence supporting a role of TMC4-rs641738 missense variant-MBOAT7- intergenic downstream variant-in the susceptibility to nonalcoholic fatty liver disease.
have not found MBOAT7 to be a risk factor for NAFLD. Notably, no NAFLD focussed GWAS to date has reported a significant association with MBOAT7. Other signals for NAFLD reported by others previously including in PPP1R3B,
also failed to show genome-wide significance in the case-control analysis. This is not surprising in the case of AAT as patients known to have this condition were specifically excluded from the cohort, limiting the minor allele frequency substantially. However, the gene MARC1, where a non-synonymous variant has been reported to protect against both “all cause” cirrhosis and fatty liver disease,
showed a similar protective effect against NAFLD with a low p value, though this did not attain genome-wide significance. This gene encodes the mitochondrial amidoxime-reducing component enzyme which can reduce trimethylamine N-oxide (TMAO) generated by oxidation of trimethylamine. Elevated plasma TMAO has been suggested to be a risk factor for cardiovascular disease and T2DM so could also be relevant to NAFLD.
Our population controls cannot therefore be considered to be entirely free of NAFLD and there is no way of investigating this further. Our use of large numbers of controls with genetic matching helps mitigate the risk that this will lead to an underestimate of genuine genetic risk factors but does not eliminate it entirely. We undertook some “case only” studies, which included a small group of patients with biochemical evidence of NAFLD but liver biopsies showing steatosis below the normal disease definition, to further mitigate this. It is generally accepted that histological interpretation of liver biopsies is subject to some inter-observer variation, even amongst experienced hepatopathologists.
This is therefore inherent to a histopathological phenotype. However, all data used in the analysis were generated by highly experienced liver pathologists based in tertiary centres and, to further mitigate against this issue, the majority of liver biopsies were scored by a member of the project's central pathology team. Finally, our replication cohort was not perfectly matched with our discovery cohort in terms of disease severity and factors such as sex, T2DM and BMI. This is due, at least in part, to this being from a single centre from Southern Europe where NAFLD risk factors such as diet may be different to those further north in the continent, resulting in lower obesity rates within the NAFLD population.
We were unfortunately not able to identify another suitable European replication cohort involving patients who had undergone liver biopsy following referral to a hepatology clinic.
In conclusion, this relatively large GWAS of histologically characterised NAFLD cases has confirmed previously reported associations and provided evidence for 4 novel signals. Much larger meta analyses may be helpful in investigating the relevance of these novel signals.
Abbreviations
ALT, alanine aminotransferase; GWAS, genome-wide association study; HTGC, hepatic triglyceride content; NAFLD, non-alcoholic fatty liver disease; NAS, NAFLD activity score; NASH, non-alcoholic steatohepatitis; OR, odds ratio; PCA, principal component analysis; SAF, steatosis, activity, and fibrosis; SNP, single nucleotide polymorphism; T2DM, type 2 diabetes mellitus; TMAO, trimethylamine N-oxide.
Financial support
This study has been supported by the EPoS (Elucidating Pathways of Steatohepatitis) consortium funded by the Horizon 2020 Framework Program of the European Union under Grant Agreement 634413, the FLIP consortium (European Union FP7 grant agreement 241762) and the Newcastle NIHR Biomedical Research Centre.
Authors' contributions
Study concept and design: QMA, CPD, AKD; acquisition of data: QMA, CPD, LV, MM, DT, ADB, PB, OG, JP, YL-L, GPA, MA, HY-J, MV, J-FD, PI, DP, ME, SK, SF, SP, EB, KC, VR, JMS; analysis and interpretation of data: HJC, RD, QMA, SC, AKD; drafting of the manuscript: AKD, RD, HJC, QMA; critical revision of the manuscript for important intellectual content: all; statistical analysis: RD, HJC; obtained funding: QMA, CPD, AKD; administrative, technical, or material support: OG, JP, YLL; study supervision: QMA, HJC, CPD, LV, AKD.
Conflict of interest
Quentin Anstee reports grants from European Commission during the conduct of the study; other from Acuitas Medical, grants, personal fees and other from Allergan/Tobira, other from E3Bio, other from Eli Lilly & Company Ltd, other from Galmed, grants, personal fees and other from Genfit SA, personal fees and other from Gilead, other from Grunthal, other from Imperial Innovations, grants and other from Intercept Pharma Europe Ltd, other from Inventiva, other from Janssen, personal fees from Kenes, other from MedImmune, other from NewGene, grants and other from Pfizer Ltd, other from Raptor Pharma, grants from GlaxoSmithKline, grants and other from Novartis Pharma AG, grants from AbbVie, personal fees from BMS, grants from GSK, other from NGMBio, other from Madrigal, other from Servier, outside the submitted work; Dina Tiniakos reports consultation fees from Intercept Pharmaceuticals Inc, Allergan, Cirius Therapeutics and an educational grant from Histoindex Pte Ltd; Guruprasad P. Aithal reports institutional consultancy income outside the scope of this study from GSK and Pfizer; Michael Allison reports consultancy/advisory with MedImmune/Astra Zeneca, E3Bio, honoraria from Intercept, Grant support from GSK, Takeda; Jean-Francois Dufour reports advisory committees with AbbVie, Bayer, BMS, Falk, Genfit, Genkyotex, Gilead Science, HepaRegenix, Intercept, Lilly, Merck, Novartis and speaking and teaching with Bayer, BMS, Intercept, Genfit, Gilead Science, Novartis; Pietro Invernizzi reports grants from Intercept, Gilead and Bruschettini; Mattias Ekstedt reports personal fees from AbbVie, AstraZeneca, Albireo, Diapharma, Gilead and non-financial support from Echosens (through LITMUS IMI project); Karine Clement has no personal honoraria but has consultancy and scientific collaboration activity for LNC therapeutics, Confotherapeutics and Danone Research; Jörn M. Schattenberg reports grants from Gilead and Boehringer Ingelheim and fees from Gilead, Boehringer Ingelheim, Galmed, Genfit, Intercept, Novartis, Pfizer and AbbVie outside the submitted work. All other authors report no conflicts of interest.
Please refer to the accompanying ICMJE disclosure forms for further details.
Acknowledgements
We are grateful to Julian Leathart for technical help, Lee Murphy and colleagues (Edinburgh CRF) for their assistance with GWAS provision, Elsbeth Henderson, the liver theme of the Newcastle NIHR Biomedical Research Centre (BRC), the gastrointestinal and liver disorder theme of Nottingham NIHR BRC (reference no BRC-1215-20003) and the Assistance Publique Hôpitaux de Paris for help with patient recruitment, Kristy Wonders for study management, Anna Fracanzani for helpful discussions, Michael Lowe for contributing to statistical genetics analysis and Daniele Cusi (Hypergenes) for provision of control data.
Quentin M. Anstee, Simon Cockell, Heather J. Cordell, Ann K. Daly, Rebecca Darlay, Christopher P. Day, Olivier Govaere, Katherine Johnson, Yang-Lin Liu, Fiona Oakley, Jeremy Palmer, Helen Reeves, Dina Tiniakos, Kristy Wonders
Homozygosity for the patatin-like phospholipase-3/adiponutrin I148M polymorphism influences liver fibrosis in patients with nonalcoholic fatty liver disease.
Utility and appropriateness of the fatty liver inhibition of progression (FLIP) algorithm and steatosis, activity, and fibrosis (SAF) score in the evaluation of biopsies of nonalcoholic fatty liver disease.
Lack of evidence supporting a role of TMC4-rs641738 missense variant-MBOAT7- intergenic downstream variant-in the susceptibility to nonalcoholic fatty liver disease.