Parameter estimation, the quantum way: theory and application of quantum estimation
A tutorial for the virtual workshop Machine Learning for Quantum 2021
This is a bit awkward. You are watching a recorded video. Some of you are watching it “live” at the workshop while I sleep — since it is the middle of the night here in Sydney. Perhaps, though, this video will be posted and sit on the internet somewhere in perpetuity. In that case, know that I’m going to assume the same thing about you that I’m assuming about the workshop participants — namely, that you know a little about quantum physics or computation and have at least heard about quantum metrology.
Parameter estimation is a fundamental task in metrology, the science of measurement. The ultimate limits of precision in this task are determined by physics, and the ultimate physics theory is quantum. In this tutorial, I will review the task of parameter estimation from both a classical and quantum perspective, demonstrating the meaning of “quantum advantage”. I will give you the conceptual tools you need to navigate the ever-growing literature on this topic. To give you a sense of what’s to come, a rough outline is as follows.
First, we need to go over the context and caveats since I won’t be able to cover everything. Next up is a brief history of quantum metrology because if you don’t know where you are going, you won’t know where you’ve been. No. Wait. If you don’t know where you are. No. That’s not it either. You know what, nevermind. With the intro stuff out of the way, I’ll cover the decision theory framework and how it serves to define what we mean by accuracy. Lastly, I’ll cover a bit about quantum circuits and how they capture the experimental designs we use to do the measurements needed to estimate the parameters of interest.
Basically, there are two things I want you to take away from this talk: decision theory and quantum circuits. That’s not going to make much sense at the moment, but it will be by the end. So, remember that — decision theory and quantum circuits.
Context and caveats
The theme of this session of the workshop is the following question: how can a quantum sensor optimally extract information about its environment?
As you might expect, the answer is it depends. It depends on what the sensor is made of, on what control you have over it, on what the environment is, and on how your sensor is coupled to it. There is no generic answer to this question. Many of the talks and posters you will see in this session provide solutions relative to some assumptions that narrow the problem down to a tractable one. My intention here is not to give you any of these answers but to give you a mental framework to understand these results and the questions they are answering.
Some caveats are in order since I will not be able to touch upon everything in one hour. First, I will not mention any particular quantum technologies. This is partly to appear unbiased, but mostly because I am not an expert in the physics of these devices. Second, there will be a slight bias toward the statistical approach since that is my area of expertise, and it also happens to be the correct one ;)
But, first, a little history.
History: the first quantum advantage
Metrology is the scientific study of measurement and sits at the intersection of many disciplines — most notably physics, engineering, and statistics. It’s the job of the physicist to determine what should be measured. It’s the job of the engineer to decide how to measure it. And it’s the job of the statistician to determine what data tell us about the quantity of interest. The latter is, of course, necessary because every measurement has an error. When the physics is simple (which requires a lot of hard work to get to) and when the engineering is precise (also requiring hard work), the statistics is relatively trivial — a first-year textbook will suffice.
With quantum phenomena, the physics is hard, the engineering is hard, and so we should expect the statistics to follow suit. Indeed, this was already anticipated in the 1970s. For better or worse, those that took the first steps were mathematicians looking to generalise not the simple metrology schemes of the day but the kind of statistics that only academic statisticians might find in research journals. Helstrom, Belavkin, and Holevo invented the field of quantum estimation theory, and I guarantee there are still threads in this early work that would yield new and interesting results in mathematical statistics.
Alas, these results — as stated — are of not much use to the physicist in the lab (assuming they haven’t earned a degree specialising in functional analysis). So as physics and engineering caught up, the standard methods evolved into something all its own and the early work was either forgotten, ignored, or simply out of sight. At first, quantum physics was seen as a nuisance. Standard practices would eventually succumb to things like tunnelling, uncertainty limits, and shot-noise. But, where most saw quantum effects as a problem, Carl Caves saw them as the solution. And, in 1981, Caves proposed the first quantum advantage in the most unlikely of places: merging black holes billions of light-years away.
Prior to the 1980s, interferometry was in the realm of classical optics, the most famous example being the 1887 Michelson–Morley experiment which failed to find the aether. In a long tradition of using larger and larger experiments in physics, it was suggested that kilometre-long interferometers could detect waves in space-time generated by astronomical events like merging black-holes. As you may recall, the LIGO collaboration did indeed detect gravitational waves for the first time in 2016 using interferometers with 4km long arms. No exotic new quantum technology was used at the time.
In 1981, though, Caves suggested using non-classical light — called “squeezed” light — in part of the interferometer that was ignored. Indeed, LIGO started reporting detection events using squeezed light in 2019. The “quantum advantage” was a whopping 50% more detections — and we will get back to that later in the tutorial. In the ’80s it was also shown that the ultimate precision limits in interferometry allowed by quantum physics were below lower bounds using classical resources alone.
During the ’90s, entanglement was all the rage, and in the dazzling flurry of results were proofs that entanglement was necessary to achieve the quantum advantage. Quantum metrology, riding along in the wake of the quantum computing wave, became a popular research topic within the quantum information community and was touted as the most likely first “spin-off” technology on the way to quantum computers. The hangover quickly set it when it was shown that the quantum advantage was extremely fragile, as in lose-a-single-photon-and-you-are-toast fragile.
After about 2008, information theorists got their hands on the problem. Many proposals now mitigate noise using error suppression (such as dynamical coupling and other standard control theory techniques) and error correction (using mostly existing error-correcting codes). The scope of the problem also exploded, as well as the data processing techniques used to solve it. And more recently, these included the use of machine learning.
Much of this, of course, was in parallel with technological developments in the engineering of quantum devices. These are the so-called quantum sensors. In broadly painted strokes, quantum sensing and quantum metrology are the same thing. However, in practice, be aware that the former is used more often in the context of experiment or engineering, while the latter is used more frequently in theory.
Well, there you have it, the incomplete history of quantum metrology — or sensing, whatever.
Framework: deciding on decision theory
Before we get started on physics, I want to get the statistics out of the way. It is essential because the question at the outset — how can a quantum sensor optimally extract information about its environment — is vague. And when you read papers in this field — or any field in physics for that matter — you will run up against the same unanswered question. What is the exact problem, and how do I tell if it’s been solved?
I want to introduce you to decision theory because it is a powerful way to frame the problems studied in this field. There are two things I want you to keep in your mind after this lecture, and this is one of them.
While physics is colloquially considered to have the succinct and audacious goal of understanding how the entire universe behaves, it is much simpler in practice. All physics worthy of discussion starts with the same two things: a mathematical model of some physical situation and a set of mathematical rules (laws) which it obeys. Questions posed of the physical situation are then mapped to questions of the model, where the logical consistency of mathematics allows one to ask well-posed problems with unique solutions. Of course, this is the ideal case. Completely unconstrained, though, we should always be striving for the ideal.
In quantum sensing, the models and laws are quantum or semi-quantum. Semi-quantum usually means the most exciting parts of the physical situation are modelled with quantum physics. Simultaneously, other features are more conveniently modelled with classical physics than a fully quantum treatment. The most ubiquitous example of semi-quantum modelling is noise. It is so common that it often goes unstated. If it is not apparent, assume noise is modelled classically.
Models are always formulas that contain variables and constants. Think about, for example, Newton’s Universal Law of Gravitation or the Schrodinger Equation. From a computational perspective, these are just functions, or black boxes, which take values for some quantities and output others. If the physical situation includes measurement, the black box takes as input values the relevant quantities and outputs predictions for the measurement outcomes. In every real-world measurement scenario, the modelling includes noise or uncertainty, and the predictions are probabilistic. Hence, we are dealing with statistics.
The relevant research questions asked of the model are always inductive in the laboratory. This contrasts with textbook quantum physics, where the problems are deductive. Solving equations, running simulations, and predicting the outcome of experiments are all deductive problems — a sequence of logical steps is followed to arrive at a unique conclusion. Solving the Schrodinger equation knowing the starting state and Hamiltonian and then predicting the probability of the measurement outcomes is a deterministic process arriving at a unique answer — deductive reasoning. The opposite — starting with measurement outcomes and asking what the starting state or Hamiltonian was — is inductive reasoning. This is the category that metrology falls under.
A powerful way to deal with this problem is statistical decision theory. In the context of metrology, it goes like this. There is some quantity x which you’d like to estimate. If you guess that x is y, that’s wrong — not good. In reality, maybe you lose money or time or some other valuable quantity. Whatever it may be, the penalty for being wrong is abstractly quantified by loss. The loss L(x,y) is the penalty for mistaking x for y. The most common example when x is a single real number is squared error: L(x,y) = (x-y)². There are many other examples, including ones unique to quantum information theory, such as the fidelity between two quantum states. Not all loss functions are equal, and a procedure called “optimal” may only be so for the chosen loss function. Thus, you should always know what loss function is being used.
The “procedure” to produce an estimate in metrology is broken into two distinct parts. Chronologically, the first is the actual physical setup used to perform the measurement. The second is the data processing methods used to map the data to an estimate. At the moment, we are talking about the latter only. We’ll get to the former next, but note that machine learning can aid in the design of either — or both at the same time!
When we talk about an estimate y, we mean an estimator, which is a function that maps every possible data set to a guess. So y is y(d), where d is a data set. When designing an estimator, we want a low loss on average. Given the true value for the unknown quantity x, the probability of a data set d is Pr(d|x) — sometimes called the likelihood function. Since you may have heard this term in the context of maximum likelihood, I’ll note that one particular estimator, y is the one that makes Pr(d|y) as large as possible — the maximum likelihood estimator. Like all estimators, that estimator incurs some loss — and, on average, that loss is called risk.
For squared error loss, the risk is the familiar mean squared error quantity. In some miraculous cases, and sometimes only asymptotically, the risk is independent of the value of the unknown parameter x. In general, it is not. This is a problem, especially in simulations where the researcher must choose a value for the ostensibly “unknown” parameter. In practice, this is done “randomly”, and the choice of distribution is often implicit or never mentioned. Explicitly, this additional average produces the Bayes risk.
In most papers on metrology, especially those containing numerical simulations and ones inventing new techniques, you will find a figure which is meant to demonstrate the performance of some estimator. It will look something like this.
The vertical axis is always the risk since many simulations averaging over randomly generated measurement results will have been performed. Different lines will correspond to the performance of different estimators. Lower loss is obviously better and so the newly invented method will be the lowest curve. The line itself is most likely the Bayes risk since the simulations will also have been repeated over many different choices of unknown parameters. Typically this will be a uniform distribution, but if you don’t know, ask! — especially if you are a referee. The horizontal axis is the resource considered. Typical choices are the number of measurements, number of qubits, input power, or time. If you are lucky, some theoretical calculations will also be present — either the performance of the optimal estimator or a lower bound will be displayed.
Goal: the quest for the ultimate bound
This is a great segue to one of the most common themes in metrology: lower bounds. In particular, you will hear a lot about three lower bounds: the standard quantum limit, the Heisenberg limit and the Quantum Cramer-Rao bound. These can be incredibly confusing because they are never introduced in generality, and it’s not obvious what parts of the particular context in which they are being discussed are necessary. Moreover, the language in which they are often discussed implies that these things are the ultimate bounds on accuracy independent of any of the choices made by the researchers, even implicitly.
Recall from our discussion of decision theory many ingredients need to be specified before we can talk about accuracy. These questions must be answered:
1) What is the domain of the unknown parameter?
Is it a single real number? Is it bounded? Is it a vector? A matrix? The problem of estimating an unbounded real number is much different from estimating a probability distribution, for example.
2) How is the loss measured?
Is it squared error? Absolute error? Relative error? Fidelity? The behaviour of an estimator might be very different for one loss function than it is for another.
3) What is the resource being considered?
Is it the number of measurements? Time? Energy? The number of qubits? A theoretical calculation in the limit of a large number of measurements is not going to apply to many qubits, for example.
4) What is the prior distribution?
Is uniform? Gaussian? Bounded? The prior distribution can have such a massive effect on performance that estimators are often tailored for specific priors. These are called Bayes estimators, by the way, and are my personal favourite.
Classic metrology, where the notions of standard quantum limit and Heisenberg limit were introduced, can be seen as an instance of a decision-theoretic problem with the following answers to our questions:
1) The parameter of interest is encoded as a phase, a real number between 0 and 2π.
2) The loss function is squared error. Sometimes the square root of the risk is taken to arrive at the root mean squared error.
3) The resources being counted are the number of qubits used. Way, way back, metrology was the domain of optics, so the number of qubits would mean the number of photons there.
4) Many discussions will not commit to a choice of prior, but if one is used, it is almost certainly the uniform one on the interval [0,2π).
Given these choices, one can apply standard decision-theoretic techniques to arrive at a lower bound on the risk. In quantum information science, one is often interested in comparing quantum to classical. Here, the distinction lies in the constraints on the resources, the number of qubits. Quantum refers to the ability to do anything allowed by quantum mechanics to the qubits — most notably entangling them. Classical, on the other hand, means that one must prepare and measure the qubits one at a time — no entanglement. The answer in the latter case is called the standard quantum limit, while in the former, it is called the Heisenberg limit. Usually, these are written as follows.
The symbol ϕ is used because the unknown parameter is often encoded in a phase, and this notation is traditional. Root mean squared error is used probably as a historical quirk since standard error analysis is so popular for simple statistical problems. An additional assumption going into this calculation is that the estimator used is unbiased — a detail I don’t want to get into that can nevertheless elicit strong emotions. Much of the literature in early quantum metrology was focussed on ways of achieving accuracy beyond the standard quantum limit.
Once any of the assumptions above are relaxed, the bounds become meaningless. They are, however, derived from a slightly more general bound. For any parameter, the mean squared error is bounded below by a quantity called the Fisher information, which can be computed by averaging the derivative of the likelihood function. In decision theory, this is called the Cramer-Rao lower bound and still applies to any unbiased estimator.
The bound becomes more complicated when the parameter of interest is not a single real number. In the quantum case, when a parameter is encoded in a quantum state, the Fisher information must be replaced by the Quantum Fisher information, which is a function of a positive semi-definite matrix. Beyond a single parameter, it becomes more difficult to achieve this bound — and especially so in quantum physics! Consider a simple example where one wants to estimate position and momentum simultaneously. Clearly, the uncertainty principle forbids achieving optimal accuracy for both quantities. Nevertheless, maximising the Fisher information is a commonly used heuristic which is defended by appealing to the Cramer-Rao bound. Outside of this context, Fisher information has no operational meaning.
This brings us to one of the major themes in modern quantum metrology — namely, creating numerical optimisation algorithms to reduce loss as much as possible. In the literature, you will find statistical methods, optimisation methods, and machine learning methods, all aimed at reducing risk. But, in order to reduce risk, you need to know the likelihood function — that is, you need to know the model. You need to know the experimental setup, or at least an idealisation of one.
Strategies: how we are going to get there
So far, we have been discussing data processing alone. An advantage we have in quantum physics — at least from the comfort of our own chair — is the ability to design the experiment. Quantum information theory possesses a powerful tool to visualise computational and information-theoretic protocols, namely the quantum circuit.
This is the quantum circuit of the basic quantum metrology protocol. It shows a state being prepared, then a dynamical process encodes a parameter onto the state, and finally, a measurement is made. In the original interferometric protocol, the process which encodes the parameter is the application of a relative phase. But a quantum circuit is general enough to capture anything modelled by standard quantum mechanics. This picture captures the common scenario succinctly where a single quantum system is prepared and measured and then prepared again. Since the measurement uncorrelates the past quantum state from the projected one, we can image a long sequence of prepare and measure experiments as a single experiment with many uncorrelated systems running in parallel. Either scenario is where the standard quantum limit would apply. That is, the mean squared error of estimating x is bounded below by one on the square root of the number of measurements or preparations.
As mentioned earlier, entanglement can save the day — but how? Well, in the circuit model, it’s easy to visualise.
Before the operation which encodes the parameter acts, we can apply some entangling gates. As the picture suggests, we require simultaneous access to all the qubits for this to work. And, if we are going to throw in a large multiqubit operation, we might as well consider the possibility of doing another one after the parameters are encoded.
If the single parameter is encoded as a phase, and we are interested in mean squared error, then the Heisenberg limit applies to this situation. Of course, in any of the situations depicted in these circuits, the Cramer-Rao bound is a lower bound on the mean squared error on any unbiased estimator. In unconstrained and idealised situations, the Cramer-Rao bound coincides with the Heisenberg limit.
As mentioned briefly before, noise is a huge problem for quantum metrology, as it is for most quantum technologies. A recent suggestion made by several research groups was to use quantum error correction in the same way as we do to rescue quantum computation from noise. In the case of concatenated codes, we could draw a picture like this.
Much like the idea of reaching the Heisenberg limit, this is “blue sky” research. We are a long way from reaching this limit technologically, but that doesn’t mean quantum metrology is useless.
Today’s practical metrology problems involve constraints that make the optimisation problem far more difficult. But, quantum resources can still offer enhancement over state of the art. Let’s return to black holes for an example.
This is real data from the LIGO project. The black line is the measured noise (power spectrum) of the native device, which detected the first gravitational waves. The green line is the measured noise using squeezing, which requires a small (relatively speaking) amount of entanglement. This might not look impressive, but the improvement produces a 50% increase in gravitational wave detections! But, before we get too excited, look at that grey line. That line is where the noise would be if all known additional sources of noise were eliminated. The Heisenberg limit is nowhere near visible on this graph. It’s just not relevant.
Quantum metrology that is practically useful in the near term need not consider the “ultimate bounds” on accuracy. Examples of such studies focusing on experimental designs that require little-to-no entanglement are those with active feedback.
Here the picture is implicitly showing feedback on future measurements. However, the feedback could affect the design of other parts of the protocol. In some simple parameter estimation models, schemes based on feedback can achieve the same performance as that achievable with entanglement. As the problems become more complicated, more sophisticated numerical techniques — such as machine learning — are required to achieve optimal accuracy. Such complications include the estimation of multiple parameters, noise, and control errors. All of these are relevant to devices being manufactured today.
Summary
I wanted you to take away two things from this talk: decision theory and quantum circuits. But you were also introduced to a few extra players.
The (quantum) Fisher information is often used as a surrogate for other figures of merit and only has a meaning in the context of the Cramer-Rao lower bound.
The Quantum Cramer-Rao bound is a lower bound on the mean squared error. It is independent of measurement and generally not achievable. When a measurement is specified, the model becomes classical, and the relevant lower bound is the classical Cramer-Rao bound.
The standard quantum limit is a Cramer-Rao bound when no entanglement is allowed. The Heisenberg limit is a Cramer-Rao bound when entanglement is allowed.
Now you can understand these concepts based on the framework of quantum circuits as models and statistical decision theory as the method to analyse them. In fact, I’ll go a step further and claim that you — or I, — don’t really understand what is going on unless we can draw the circuit and specify the four ingredients in decision theory: domain, loss, resources, and prior.
Now you can annoy the rest of the speakers in this session by asking them the following: what is the loss function, what is the estimator, what is the prior distribution, and what does the circuit for this model look like? Tell them Chris sent you.