Large-Scale PCR Testing with Compressive Sensing

One of the most bewildering facts about "the pandemic" is that no widespread testing is available. Independent of the country, people complain about being denied a test because they were "unlikely" to carry the virus, even if they showed clear symptoms. Reasons given were that they "were not in Wuhan recently" or that they "did not travel recently", even weeks after the outbreak. Months later it was discovered that the virus probably spread undetected much earlier than believed. Widespread testing is still not available.

Various reasons have been given for the unavailability of widespread tests: the laboratories have limited capacity, there is a shortage of the required chemicals. There is or was some truth to that - the chemicals were imported from China, and both the supply in chemicals and the laboratory capacities were never designed to cope with the current situation. Months later, there is still no sign of availability of widespread testing, however. Certainly, the number of patients in intensive care has fallen significantly, but there is still no certainty regarding the fraction of population that was infected or the current number of undetected cases. The window is closing fast: there is still no scientific consensus regarding the duration of infected people's immunization. Some suggest that the immunization might be weak and vanish after a short time. Not testing a large fraction of the population for immunization risks that we might never know how large the fraction of infected people was. Not testing for the virus itself leaves us blind with respect to a possible "second wave."

Only very few countries have performed tests on large numbers of people, and it is not the countries that you would suspect: one of them is China, which tested rougly ten million people within two weeks [1, 2, 3]. This was done using classical "pooling", where samples from multiple participants are mixed and tested. If the test is positive, at least one of the participants is likely positive. If the test is negative, all of the participants are likely negative. Of course, there is a probability for false-positive and false-negative results, which is generally larger than when every person is tested individually.

Classical pooling also has other problems: it is only efficient as long as the number of positive participants is small. If the fraction of positive participants becomes larger, the pool size has to be reduced significantly. Some figures regarding efficiency can be found in [4].

In the signal processing community, the problem posed by large scale PCR testing is well-known as a "Compressed Sensing" (CS) problem: the goal is to find the subset of a sparse vector's entries which are nonzero. Another way to formulate the question is, what is the most efficient way to measure a sparse signal? The aim of the measurement is always to get an almost exact representation of the signal with respect to some norm (e.g. l2-norm or l0-"norm"), while being as "lazy" as possible when taking measurements.

It was therefore bewildering for me to not find a single paper about the application of CS in the medical/biological field (have I missed any? Please contact me). Certainly classical pooling is a tempting algorithm because it is simple and easy to understand, while most algorithms employed by the CS community require a background in statistical signal processing. Compared to modern CS recovery algorithms, classical pooling is extremely inefficient, however. Note that information theory tells us that the smallest number of samples required to reconstruct a signal from linear measurements is limited by the  Rényi information dimension d(x), which, in simple terms, is the fraction of non-zero coefficients in a sparse vector. Classical pooling is far away from reaching this limit - at a fraction of 10% positive participants, the optimal pool size is already smaller than two (the smallest possible pooling size) and even at a pool size of two, the number of tests that need to be performed is larger than 50% of the number of participants. Information theory tells us that only 10% of tests should be sufficient, however. While even modern CS algorithms only approach this limit under special circumstances, they come much closer in general.

Since my research as a predoc-student focused on compressive sensing, it was immediately obvious that the large-scale PCR testing challenge is in fact a CS problem. I have been working on a CS recovery algorithm specialized towards the large-scale PCR testing problem. If you are interested, an introductory presentation and some simulation results can be found in this pdf. You can also contact me on sbirgmeier@21er.org if you have any questions.

 

[1] https://www.abc.net.au/news/2020-05-26/china-coronavirus-wuhan-testing-millions-in-10-days/12283202

[2] https://www.nytimes.com/2020/05/14/world/asia/coronavirus-testing-china-wuhan.html

[3] https://www.bbc.com/news/world-asia-china-52651651

[4] https://www.csh.ac.at/pooling-corona-samples-boosts-test-efficiency/

© 2010-2021 Stefan Birgmeier
sbirgmeier@21er.org