Hokum-Balderdash Assay: Testing screening tests

Update (March 2016): A more comprehensive contingency table which includes detailed definitions can be viewed here.

-------------

How reliable are home pregnancy test (HPT) kits? Given 100 who've just conceived, 75 of them will test positive. And given 100 who aren't pregnant, 75 will correctly test negative [see note 1]. Assume the chances of conception when coitus is performed at some randomly chosen day of the month is 5% [see note 2]. If the HPT result comes out positive should the woman panic and faint if it's an unwanted pregnancy? Or if she's been trying to be with child for years, should she immediately broadcast the news on Facebook, Twitter, and email every friend and kin she has?

Before finding out the answer, let's introduce some important terms. The base rate is the prevalence rate or frequency of a disease, condition or phenomenon in the population. Sensitivity is the rate/frequency/probability a medical screening test will result in a positive finding given that the person being tested has the condition or disease which the test is for. In probability notation sensitivity can be denoted as P(test is positive | person has the condition). That's read as "the probability that the test comes out positive given the person has the condition." The vertical bar is read as "given." Specificity is the rate/frequency/probability a medical screening test will result in a negative finding given that the person being tested does not have the condition or disease which the test is for. In other words, specificity is P(test is negative | person doesn't have the condition).

So for HPT the base rate = 0.05, sensitivity = 75/100 = 0.75, and specificity = 75/100 = 0.75.

What we're interested in finding out is how reliable the test is, ie., how accurate it is when the test comes out positive and when it comes out negative. In essence we want to know P(being pregnant | test is positive) and P(not being pregnant | test is negative). These are two values and they needn't be the same. If the values are close to 1 (e.g. > 0.90) then we can say that HPT is reliable.

Notice that when we're talking of reliability of the test when it results in a positive finding we are looking at P(being pregnant | test is positive). This is not the same as the sensitivity of HPT which is P(test is positive | person is pregnant). The test's sensitivity measures how accurate it is when the subjects who are being tested are already known to have the condition (eg. are definitely known to be pregnant before the test is even administered). Quite obviously women who buy HPT kits do so to find out whether or not they're pregnant. Likewise the reliability of the test when the results come back negative is the P(not pregnant | test is negative). Contrast this with specificity of HPT which is P(test is negative | not pregnant). So the reliability indicators we're interested in are the converse of sensitivity and specificity.

To help us derive all the numbers, we shall enlist the help of a spreadsheet. Take a look at this 2x2 contingency table (it would be best if you open a new tab or window on your browser for the spreadsheet so you can easily switch between this text and the tables). Although I can simply give you the equations for P(being pregnant | test is positive) and P(not pregnant | test is negative), using a table is far more illuminating and easier to understand. But if you would rather have the formulas, I've included them in the spreadsheet--look for the rows "true positive" and "true negative."

Although there are clearly more than two rows and two columns in the table it's described as 2x2 because the two variables have two states/levels each. In the case of HPT the variables are pregnancy and HPT test result. Their states are: pregnant, not pregnant, test result positive, test result negative.

The equations to compute for the values of the cells are provided. The values for cells which don't have explicit equations for them can easily be derived after the other cells have been filled. Keep in mind that except for the last table which uses user-provided sample size, the cells of the contingency tables all contain probabilities. Therefore, they will only have values from 0 to 1, inclusive.

Now scroll down to the row labeled "EXAMPLE." I've already plugged in the base rate, sensitivity and specificity for HPT kits (see yellow-colored cells). Focus your attention on the purple cells. As you can see when HPT says a woman is pregnant, there's only a 13.64% chance that it's true. Thus the false positive rate = 86.36%. Out of a hundred times the test comes out positive, 86 of them will be false alarms. Not very comforting at all.

But look at HPT's true negative rate--P(not pregnant | test negative). It's 98.28%. When the test tells a woman she hasn't conceived, then it will be wrong only 2 out of 100 times. Now that's reliable.

Before moving on, I suggest creating a new google spreadsheet so you can play around with the inputs and see how the probabilities change. Go to https://docs.google.com/ (and sign in if you need to). Click on the "Create new" button on the upper left hand corner of the window. A drop menu will appear. Click on "Spreadsheet." Go to the contingency table and click on "View" on the menu bar on top. Click "Show all formulas" on the drop menu. Press CTRL-A to select the entire table and then CTRL-C to copy it. Go back to your new blank spreadsheet and press CTRL-V to paste. You will have to manually join the various cells for those whose text is too long to fit one cell (and so it wraps down). Do this by selecting the cells on the row which you want to join and then click on the merge icon (it's the square one with left and right facing arrows). To "unmerge" select the merged cells and click the icon.

Now you can change the values of the yellow colored cells--base rate, sensitivity, specificity, sample size--and watch how the values in the tables below them change. Try inputting a value of 1 for specificity and watch how false positives are completely eliminated no matter what the sensitivity is.

One of the reasons that HPT (or for that matter, other tests) are prone to false positives (i.e., test result is positive but it's wrong--it should actually be negative) is because the base rate is rather low. And as we decrease the base rate--keeping sensitivity and specificity constant--the higher the false positive rate goes up. Try tweaking the base rate on your spreadsheet and see what happens.

If low base rate decreases true positive rates, then higher base rates should increase it, right? Remember we assumed that the base rate, thus P(getting pregnant ) is 0.05? Well, our assumption is that the woman doesn't know her ovulation cycle--she doesn't know the time of the month she's fertile. Now suppose she does. Suppose she knows exactly the six-day ovulation period when she can conceive [see note 2]. If she then has coitus during this period, the chances of her conceiving jumps to approximately 20% (it's actually between 10% to 33% depending on the day). Given that the base rate has increased, the reliability of HPT should increase as well. If we change the base rate to 0.20, the false positive rate drops to 57.14%. If the test comes out positive, it's nearly a coin toss whether one is really pregnant. On the hand, look at how the false negative rate has been affected. It's increased from 1.72% to 7.69%. Moral is: You can't have your cake and eat it too.

Let's move on and use another real life example. Fecal occult blood test (FOBT) is a kit which can be used at home to screen for, among other conditions, possible colorectal cancer (CRC). As with HPT, FOBT kits from different manufacturers vary in their sensitivity and specificity. Let's take an "average" FOBT which has a sensitivity = 65% and specificity = 95%. How often (or rarely) does CRC occur in the population? Given the prevalence in 2007 we can estimate it to be 0.37% [see note 3].

Plugging those values in our spreadsheet we get the following:

True positive rate = 4.61%
False positive rate = 95.39%

True negative rate = 99.86%
False negative rate = 0.14%

When FOBT comes out negative we can almost be proof positive that we are CRC-free. But when it comes out positive, further testing is necessary to confirm/refute the initial screening. The low base rate means this test is prone to false positives.

----

Notes:

1. Home pregnancy test sensitivity and specifity values: "The results we have suggest that for every four women who use such a test and are pregnant, one will get a negative test result. It also suggests that for every four women who are not pregnant, one will have a positive test result." In other words, P(test negative | pregnant) and P(test positive | not pregnant) are both 1/4. P(test positive | pregnant), i.e., sensitivity, and P(test negative | not pregnant), i.e., specificity of the test, are the complements of these values: 1 - 1/4 = 0.75.

It is important to note that different brands of HPT kits have varying sensitivities and specificities.

2. Pregnancy base rate: During the 6-day ovulation period the probability of becoming pregnant is between 10 to 33%. Computing for a simplistic average we get approx 20% for P(pregnancy | coitus during the 6-day ovulation period). According to the study there is no conception if coitus is outside this 6-day period. Therefore the P(pregnancy | coitus outside the 6-day ovulation period) = 0. Since the ovulation period is 6 days and there are 30 days/month, the probability that any randomly picked day is within the 6-day ovulation period = 6/30 = 1/5 = 0.2. We need to find the probability of getting pregnant regardless of when coitus is performed, that is we need to determine P(pregnancy).

Let:
p = pregnancy
p' = no pregnancy
v = coitus during 6-day ovulation period
v' = coitus outside the 6-day ovulation period

P(v) = 0.2
P(v') = 1 - P(v) = 0.8
P(p|v) = 0.25
P(p|v') = 0

P(p & v) = P(v) * P(p|v) = 0.2 * 0.25 = 0.05
P(p & v') = P(v') * P(p|v') = 0.8 * 0 = 0
P(p) = P(p & v) + P(p & v') = 0.05 + 0 = 0.05

I actually used a tree diagram (with ovulation period as the first two branches) and a 2x2 table as in the spreadsheet to aid in determining the probabilities. The above equations are a distillation of that graphical and tabular process.

3. Prevalence of colorectal cancer: "On January 1, 2007, in the United States there were approximately 1,112,493 men and women alive who had a history of cancer of the colon and rectum -- 540,636 men and 571,857 women. This includes any person alive on January 1, 2007 who had been diagnosed with cancer of the colon and rectum at any point prior to January 1, 2007 and includes persons with active disease and those who are cured of their disease." In 2007 the population of the US was 302.2 million. Therefore we can compute an estimate of the base rate for CRC: 1.112million / 302.2million = 0.0037.

Hokum-Balderdash Assay

Thursday, January 20, 2011

Testing screening tests

No comments: