Humans, myself certainly included, have terrible instincts for probability and statistics. Evolution has gifted us with programs to deal with specific situations rather than a general aptitude for determining actual likelihood that X will occur given Y.

The human brain apparently has two natural states when it comes to disease avoidance—You look healthy fellow human; I want a hug! and Gaaaaaaahhhhh UNCLEAN!!!!!!! avoid at all costs!!!!

We can see this on display as people react to the novel corona virus. A large number of people have suddenly become even more germaphobic than I am, to the point they wear masks and walk wide semicircles around the other people they pass on a distinctly uncrowded sidewalk on a hot ,sunny, breezy day in southern California.

This is no way to run a society, and highly impractical and ineffective for the practice of medicine, which is why they spend the first two years of medical school beating proper statistical analysis into you and, if you are good and devoted, you spend the rest of your career desperately trying not to backslide.

In that context this paper out of Taiwan looks valuable. Early in the epidemic they did contact tracing on all the known infections and did a good study of the kinds of social contact that resulted in other people getting infected. This paper is unlikely to be reproduced because the circumstances in Taiwan at the time were close to unique and ideal. Taiwan did a great job, early on, of quarantine and contact tracing, so they have pretty complete data. Because infection in their society overall was well contained, they have much higher confidence in the source of the infections for the people the original patients supposedly gave it too. That is, if patient Y had contact with original patient X, there is a high probability that they got it from X rather than some unknown patient Z. Not perfect, but probably better than we will ever see again.

https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2765641

What did the paper show?

They started with the inital 100 confirmed cases in Taiwan between January 15th and March 18th 2020 and followed all contacts until 14 days after their last exposure to the index case (i.e. the patient from that initial 100 they had contact with). In the paper their criteria for a contact is someone the index patient was within 2 meters of for 15 minutes or more without PPE, or if it was in a medical setting, without PPE appropriate to the encounter. They traced 2761 such contacts and found that only 22 got COVID-19 themselves. Of these, the overwhelming majority were family members. The next largest group was healthcare workers and third was “other” which I read as people they encounter outside, on transportation, in stores, at the office etc. of the 1836 such contacts they traced (not healthcare workers or family) a total of 1 person developed COVID-19. Remember, that is after being exposed to an index patient without protection, within 2 meters for at least 15 minutes.

Of course this makes complete sense. Despite our instincts telling us any exposure is an unacceptable risk, in reality the chance of transmission in any case depends on a few factors including proximity and duration of the encounter. Obviously one participant must be infectious and the other must be a COVID-19 virgin. Once that is established, how much spewiness (I’m sure there is a real medical term for this but I am too lazy to look it up and I like mine) does the patient possess, i.e. how much virus containing sputum his he spewing forth per unit time? Someone with higher virus counts per spit globule is more spewy. Someone coughing is more spewy than someone just talking etc. It really is no wonder that casual infrequent encounters with strangers transmit less because they just don’t last as long and tend not to occur in environments (like the home or a hospital) that help spread.

The article also notes that people are probably infectious a little before symptom onset though they don’t make a statement about how much before. They find that the time between serial transmissions from one patient to another, and then that new patient to another, is about 5 days. Most transmission from a sick patient to anyone else occurs within 5 days of symptom onset, very little after.

What does this mean in terms of risk? Here is a back of the envelope calculation of the chance of a grocery store checkout person getting COVID-19 from a customer. Please feel free to write in with criticism of my assumptions and any stupidity in my calculations.

Assume a person who has an active case (let’s assume such activity lasts a week on average) buys groceries while not wearing a mask, and is checked out by a clerk, who has never had COVID-19 and is not wearing a mask. The average time they spend together is probably a lot less than 15 minutes so lets simplify and model customers as 15 minute time slot customers. In reality that could be 2 or three different people but we will smoosh them together for our estimate. A clerk could have 4 such encounters per hour and it’s an 8 hour workday and just to be generous lets assume (s)he works 7 days a week. At my grocery store I see 4 lines open at a time and I presume this is a little more at peak and a little less at slow times. A gorcery store, I contend, is mostly, overwhelmingly repeat customers and the average customer probably shops once per week. That may not be true but making everything a week simplifies this calculation. I’m trying to err on the side of making the siutation seem worse.

So 4 [15min encounters/hour]x8[hours per day]x4[clerks per shift]x7[days per week] means the store has about 896 15 minute composite customers who could at any time have COVID-19. Next let’s assume that ultimately about 20% of those customers will have COVID-19 infection and be contageous at some time during the pandemic. I base this assumption on typical ranges reported for past pandemics. I know we supposedly need 60-70 % of people to get it for herd immunity, but herd immunity does not, historically, seem required for a pandemic to quiet down. If you don’t like that assumption please redo the calculation with your prefered number. Note, if your assumption is 100% get it, then the whole thing is irrelevant because presumably our hypothetical grocery worker would be part of that too.

So now we have 179 such 15 minute customers each with the potential to infect our clerk. But wait, they shop once a week and are transmitting pretty much for a week so the number the average clerk will check out while they are infectious is a quarter of that so 44.8 potential 15 minute customers.

Based on the findings in the linked study, each such potentially infectious encounter has 1835 chances out of 1836 to result in no infection in our clerk. That’s pretty low but if you take enough bites at that apple you could get unlucky; It only takes one bad bite. So what if, over the span of the pandemic our clerk takes 44.8 bites? Let’s round that up to 45. The odds are (1835/1836) to the 45th power that (s)he would emerge unscathed. That is .9785 (assuming I don’t have a floating point error in my HP35 calculator simulator) so the clerk has a 2.4% of getting COVID-19 from a customer if no one wears a mask.

Bear in mind that this is in a model in which 20% of everyone gets it anyway, so the clerk’s chance of getting it in this model is already 20% without the 2.4% chance from these inconsiderate customers who come in without masks when they are clearly sick (or will be sick in a couple days to be fair to some of them).

Now what if we take some simple measures like not allowing symptomatic people in the store? Now instead of 7 days in which a 15 minuter could infect a clerk, we are down to maybe 2. In that case you would multiply the 44.8 by 2/7 and the likelihood of getting infected is now .7% in this model. Put up some plexiglass between the customer and clerk and well, now you no longer fit the definition close of contact in the study. There is still some risk but it is neglible in my estimation.

Now compare the activities you think must be forbidden like walking on a hiking trail or going to a beach with your family or even eating in a restaurant. Compare it to how much danger you figured a checkout clerk at the grocery store would be in before we required them and us to wear masks. Then think about the risk again. There is more in the study so please read it. I would love to hear criticism of my back of the envelope calculation. I swear I didn’t put much thought in it so you stand a good chance of making me look like a fool.

The point I am trying to make is that if you get it, it will most likely be from someone you spend a lot of time with, especially in closed spaces. Yes, there are some especially bad event types, like those where a bunch of strangers get together in a closed spaces and sing and shout for hours and eat and drink together for hours more. But really, this is analogous to all the people who think a serial killer is a realistic danger in their lives. In reality, if you do get murdered, it will most likely be by someone you know well like a husband or your butler.