Alphafetoprotein: an obituary
Article Outline
The use of alphafetoprotein (AFP) serum concentration as a diagnostic test for hepatocellular carcinoma (HCC) has a long and venerable history. This use arose from the finding that some HCC's secreted massive amounts of AFP. In some instances the serum concentration exceeded 1 mg/ml (reviewed in [1]). This was in an era before the availability ultrasonography and CT scanning. When a patient presented with jaundice or a mass in the right upper quadrant a positive AFP test, particularly when the concentration was more than about 500 ng/l, was diagnostic, and obviated the need for further invasive investigations, such as hepatic angiography or diagnostic laparotomy.
The advent of sophisticated abdominal imaging techniques directly influenced the usefulness of AFP as a diagnostic test in two ways. First, it was recognized that a significant proportion of HCC's identified by imaging did not secrete diagnostic levels of AFP. Second, the use of imaging and the emergence of programs for early detection of HCC meant that AFP levels in the range of mg/ml were now seldom seen. These changes have made the use of AFP controversial, whether AFP was used as a diagnostic test or for HCC screening. In this issue of the Journal Dr Trevisani and colleagues [2] address the issue of the value of AFP as a diagnostic test for HCC.
There are several measures which can be used to gauge the efficiency of a test such as the AFP. The simplest measures are the sensitivity, specificity, and positive and negative predictive values. To remind readers, the sensitivity is a measure of the proportion of positive tests in patients with the disease, whereas specificity measures the proportion of negative tests in patients without the disease. These measurements are independent of prevalence of the disease, and are inversely related to each other. As sensitivity increases, specificity decreases and vice versa. However, the true-positive or true-negative rate of a test is also dependent on the prevalence of the disease it purports to detect. If a disease is infrequent in a population even a test with a high degree of specificity will produce a high proportion of false positives. The effect of disease prevalence on sensitivity and specificity is captured by the positive and negative predictive values. The positive predictive value (PPV) is a measurement of the frequency with which a positive result is a true positive. Conversely, the negative predictive value is a measure of the proportion of all negative results that are true negatives. Although sensitivity and specificity are inversely related, the ratios are different for different tests and in different diseases. This variation can be expressed by the Youden Index, which is [sensitivity+specificity]-1. Thus a test which is highly specific and highly sensitive will have a high Youden index. One study [3], for example, found that the Youden index for an AFP >20 ng/ml in HCC was only about 0.45, which is low.
Finally the test being evaluated could provide its output as a dichomotous variable (positive or negative), or as a continuous variable (range of values), in which case it is important to determine the test value which provides least error, i.e. the fewest false negatives and false positives. This is achieved by using the receiver operating characteristic (ROC) curve. This is a graph of sensitivity vs. 1-specificity, plotted over the whole range of values of the test.
Dr Trevisani and his colleagues have studied patients with HCC and cirrhosis due to chronic viral hepatitis to determine the performance characteristics of alphafetoprotein as a diagnostic/screening test for HCC. Although this is a retrospective study it has three major strengths. First, the study cohort was matched with a similar cohort with equivalent severity of liver disease, but who did not develop HCC. This is crucial to the proper evaluation of AFP as a diagnostic test. Second, the performance characteristics of AFP were studied over the whole range of values obtained, and third, the results were analyzed using proper tools, such as those described above, both for their experimental populations, and for hypothetical populations more representative of those seen in most liver clinics. Their unequivocal conclusion is that AFP is not a good diagnostic test to detect HCC.
Dr Trevisani and colleagues [2] have constructed an ROC for AFP as a diagnostic test in their population. From their data, the point at which fewest classification errors occurred was an AFP of 16–20 ng/l. However at this level the sensitivity was only 0.6, although the specificity was about 0.9. This means that if 16–20 ng/l is used as the cut-off for diagnosis 40% of all HCC's in a similar population (e.g. cirrhotic patients) will not detected. This is in a population with known HCC. In a population not known to have HCC (e.g. a surveillance population) the performance characteristics of AFP will likely be worse. Clearly then, AFP is not an adequate diagnostic test. As their Fig. 1 shows, if a higher cut-off is used a progressively smaller proportion of HCC's will be detected. Conversely, reducing the cut-off means that more HCC's would be identified, but at the cost of a progressive increase in the false-positive rate.
This analysis was performed in a cohort of patients in whom the prevalence of HCC was artificially set at 50% (equal sizes of experimental and control populations). Data are also provided for lower HCC prevalence rates, more like those seen in most liver clinics. We should note however, that in following patients with chronic liver disease we are more interested in incidence rates than prevalence rates. In clinics the incidence of HCC in cirrhotic patients may range from about 1–5%/year. Since Dr Trevisani's study was retrospective we cannot accurately infer how well AFP will perform when the HCC incidence is in the range usually seen in clinics, but we can conclude that the performance characteristics are likely to be similar or worse. Furthermore, in cohorts undergoing surveillance for HCC the incidence of HCC may be even lower than 1–5%, depending on the criteria for entry into surveillance. For example, in adult non-cirrhotic hepatitis B carriers infected at birth the incidence of HCC is usually less than 1% [4], [5]. Trevisani et al. [2] also show that the underlying cause of the cirrhosis, whether hepatitis B or hepatitis C does not influence the performance characteristics of AFP.
If AFP is such a poor diagnostic test does it have any role at all in the management of patients with chronic liver disease? There remain two possible circumstances in which one might consider the use of AFP. One is as a surveillance test in asymptomatic patients with chronic liver disease, and second, as a confirmatory test in patients who are already suspected of having HCC by virtue of a mass found on ultrasonography. The value of AFP as a surveillance test has been evaluated previously by ourselves and many others [6], [7], [8], [9]. Most investigators have documented that the sensitivity and specificity and PPV are similar to that found by Dr Trevisani.
The case for the use of AFP as a surveillance tool has been made by McMahon et al. [10] based on data from their ongoing surveillance program in the Alaskan Native population. This population does not have access to routine ultrasonography. In their cohort of hepatitis B carriers they found that the sensitivity and specificity of AFP as a surveillance test were 94.1 and 99.9%, respectively [11]. The PPV was 5%. AFP surveillance is currently performed using a dried spot of blood on a filter paper [12]. All positives are referred to Anchorage for investigation. In the latest report from this cohort the survival of patients in the screened cohort who developed HCC was compared to a historical cohort, which was not screened [12]. Survival in the screened cohort was better than in the historical cohort. The authors have taken these results to be evidence that AFP screening is useful. Unfortunately, design flaws in this study preclude accepting this notion. Comparison with a historical cohort is always dubious, because of improvements in therapy over time. Second, lead-time bias can account for all the apparent improved survival. Essentially, the authors were comparing survival of a cohort in whom the disease was discovered early with a cohort in which the disease was discovered at a later stage. The difference in timing of diagnosis alone means that there will be an apparent prolongation of survival, even if the intervention is completely ineffective.
Our experience of AFP as a surveillance tool [6] and that of others [7], [8], [9] using ultrasound in addition to AFP has been different. In at least three studies in which it was clear that the AFP was being used as a screening test rather than a diagnostic test, the sensitivity of AFP was between 39 and 64%, with a specificity of 76–91% and the PPV was between 9 and 32% [6], [7], [8].
Over the last 10 years of our program we have identified more than 70 HCC's in a cohort of hepatitis B carriers. In only one instance were we alerted to HCC by a progressively rising AFP while the ultrasound and CT scan were negative. This was 10 years ago, before spiral CT scanning, and before the current generation of ultrasound equipment. The diagnosis was made by angiography. Since then, AFP testing has not identified any cases that did not also have a mass on ultrasound (unpublished data).
Even the combined use of AFP and US is dubious. Yang et al. [9] compared the use of AFP and ultrasound (US) alone and in combination in a large screening study, involving nearly 20 000 subjects. This study has been running for about 5 years now. Adding the two tests together paradoxically increases the false-positive rate from about 3% with US alone to 7.5% [13]. The gain in sensitivity is about 8%. The PPV is worst when both tests are used (3.0%) compared to US (6.6%) or AFP (3.3%) used alone. This is also the most expensive regimen, expressed as cost/tumour found. Whether the increase in sensitivity is worth the deterioration which occurs in all other aspects of testing is not clear.
There are two circumstances were AFP might be elevated due to HCC, yet imaging might be negative. This can occur when the lesion is too small to be detected on US or CT, or because the lesion is diffusely infiltrating, with no clear margins between normal liver and tumour. If the lesion is too small to be seen on imaging, it also is too small to be treated. If the lesion is diffusely infiltrating it is not suitable for curative therapy. When you also consider that there is no definite evidence that outcome is different if the lesion is treated when it is e.g. 1 cm in diameter vs. 2 cm the value of early detection by AFP becomes harder to see.
Therefore, I conclude that the use of AFP as a surveillance test can no longer be justified, and it should be dropped from surveillance protocols, except where ultrasonography is either not available, or of such poor quality that lesions less than e.g. 2 cm will not be detected.
Trevisani et al. [2] suggest that in cirrhotic patients in whom a liver mass has been discovered an AFP of more than about 100 ng/ml is highly specific for HCC. They appropriately caution that their study did not have the correct controls to be certain of this conclusion. However, if confirmed, these results indicate that the correct use of AFP is as a confirmatory test. Under these circumstances AFP could be a very useful test, since imaging techniques don't always distinguish between cirrhotic macronodules, dysplastic nodules and HCC.
In summary, I believe that time has come to bid a fond adieu to AFP as a test for HCC diagnosis and particularly for HCC surveillance. AFP will hopefully still be with us, but in another guise, as a confirmatory test in patients in whom HCC is already suspected.
References
- . Alpha-fetoprotein. In: Read AE editors. Modern trends in gastroenterology. London: Butterworths; 1975;p. 91
- Serum α-fetoprotein for diagnosis of hepatocellular carcinoma in patients with chronic liver disease: influence of HBsAg and anti-HCV status. J Hepatol. 2001;34:570–575
- Simultaneous measurements of serum alpha-fetoprotein and protein induced by vitamin K absence for detecting hepatocellular carcinoma. Am J Gastroenterol. 2000;95:1036–1040
- . Hepatocellular carcinoma and hepatitis B virus: a prospective study of 22 700 men in Taiwan. Lancet. 1981;2:1129–1133
- Early detection of hepatocellular carcinoma in patients with chronic type B hepatitis. Gastroenterology. 1986;90:263–266
- . Screening for hepatocellular carcinoma in chronic carriers of Hepatitis B virus: incidence and prevalence of hepatocellular carcinoma in a North American urban population. Hepatology. 1995;22:432–438
- Prospective study of screening for hepatocellular carcinoma in Caucasian patients with cirrhosis. J Hepatology. 1994;20:65–71
- . Prospective study of alpha-fetoprotein in cirrhotic patients monitored for development of hepatocellular carcinoma. Hepatology. 1994;19:61–66
- Prospective study of early detection for primary liver cancer. J Cancer Res Clin Oncol. 1997;123:357–360
- . Hepatitis B-related sequelae. Prospective study of 1400 hepatitis B surface antigen-positive Alaska native carriers. Arch Intern Med. 1990;150:1051–1054
- . The Alaska Native HCC Screening Program: a population-based screening program for hepatocellular carcinoma. In: Tabor E, Di Bisceglie AM, Purcell RH editor. Etiology, pathology and treatment of hepatocellular carcinoma in North America. Houston: Gulf Publishing Company; 1990;p. 231–242
- Screening for hepatocellular carcinoma in Alaska natives infected with chronic hepatitis B: a 16-year population-based study. Hepatology. 2000;32:842–846
- . Combined alpha fetoprotein testing and ultrasonography as a screening test for primary liver cancer. J Med Screen. 1999;6:108–110
PII: S0168-8278(01)00025-3
© 2001 European Association for the Study of the Liver. Published by Elsevier Inc. All rights reserved.
