Stephen Burch's Birding & Dragonfly Website
Covid-19 total deaths and fitted curves for different countries
Last updated: 2 August (pm)
For the current Covid-19 pandemic, death statistics by country are widely available, with the Worldometers website being one of the most convenient sources.
There have also been many plots given in the media, including in the Guardian, the FT on-line, and elsewhere, of death rates from day of first death compared between countries (accredited to the John Hopkins University ).
I thought I'd see if I could reproduce these for myself, using a different selection of countries. This was easy enough to do using Excel and the Worldometers data.
As politicians are keen to highlight, it is difficult to make valid comparisons between countries, but it seems that an approximate way of doing this is to compare the death rates per head of population, which I show in the graph below. These are the figures for total deaths to date divided by the population for each country.
The curves above show that the totals are comparatively small, as a percentage of each country's population, all being below 0.067% (which corresponds to about 1 in 1500).
As a percentage of population, the total deaths in the UK now exceed those in all other countries shown here, and there is no immediate indication of them leveling off, unlike in several other countries (notably Spain, France and Italy). Only Belgium, with a far smaller population, has a larger percentage figure than the UK. For Belgium, that uses different counting criteria than the UK DHSC, there have been under 10,000 total deaths recorded, but as a percentage of the population this equates to 0.08%.
In Spain, which previously had the highest rate, currently daily deaths have now been very low for a considerable time, yet we hear of a recent spike in new cases in Catalonia.
The USA now has by far the highest number of Covid-19 deaths, but when expressed as a percentage of its (large) population, it currently amounts to over 0.045% and is continuing to rise appreciably (more of which, see below). Sweden, which hasn't had a strict lockdown, is currently showing a total of 0.057%, and its deaths are continuing to increase faster than in several other countries.
I've recently added the figures from India and Brazil to this plot. For India, with its huge population and comparatively low death toll to date, I have multiplied the number by ten to show the trend, which is increasing at a high rate with recent signs of acceleration. For Brazil, there is also a rapid current increase, and the numbers are shown without multiplication.
To put these figures into perspective, in the UK the percentage of the population dying each year is just under 1% or 620,000 (2018 figure) in its population of 67M. Current estimates of the total number of deaths related to Covid-19 in the UK amount to nearly 10% of the total annual number of deaths (i.e. around 60,000 out of the 620,000 total).
In the early stages of the pandemic, many sources were showing plots of log(total deaths) vs linear time as the initial expected exponential growth phase then appears linear. However although we have seen many of these plots in the media, forward projections of these figures are much rarer.
To project forward, an assumption is needed about how the death rates will change with time. There is a whole science of pandemic modelling about which I know very little. What I have seen involves a complex approach based on a large number of parameters and multiple differential equations. As I have no idea what assumptions are used in these models and how they work in practice, I've looked at a much simpler approach based entirely on the available data to date for death rates. I have ignored all the information on number of cases on the basis that these are entirely dependent on the amount of testing done, which at present is very limited in the UK at least.
Basis of approach
My approach is as
follows. It seems to me that a plausible
formula for the total deaths to a particular date is as follows:
where t is the time since first death in days. For small t this curve is linear, which is equivalent to the initial expected exponential increase. For large t, this curve eventually saturates at a value = D, where the total number of deaths in the pandemic = exp(D). The unknown parameters D and a are found by least squares fitting to the reported total deaths to date.
A slight generalisation of this formula involves three unknown parameters instead of two:
where t0 allows an adjustment to the (uncertain) date of first death.
After having tried these two equations, I have found they do not seem to fit the reported death tolls that closely when the numbers start to drop-off, probably due to the belated effects of the lockdowns that have been in force in many countries for sometime now. Hence I have tried a further elaboration of the above which adds a t squared parameter as well:
I am now using whichever of the above formulae appear to best fit the reported total deaths for each country.
There is a further complication in that there are two different ways of fitting the above curves to the available data for a country. The first is to look at the differences between the modelled and reported total deaths to date, and then square and sum these differences. The Solver in Excel then allows the sum of the squares of the differences to be minimised by changing the unknown parameters D, a , t0 and b (if used) in the above formulae. Alternatively, the modelled and actual reported new deaths for each day can be compared, and again the sum of the square of the differences found.
The weakness of this modelling approach is that these two methods of fitting the same equations to the data produce significantly different results. I think this is because the first method, which uses the logarithmic approach gives almost equal weight to the early small numbers of deaths and the later higher numbers. However the second method is looking at the (linear) deaths per day - this gives much more weight to the larger values occurring later in the epidemic.
Also in the early stages of the epidemic the numbers of reported deaths tend to rise exponentially with time (which appears as a straight line on a logarithmic plot). In this case, it is impossible to derive any meaningful value for the "curvature" parameter, a. Without that, any projections of the numbers into the future are effectively meaningless. Even as the epidemic progresses, small changes in the value of the parameter a can have a huge effect on the modelled number for the total number of people that will die in the epidemic.
Because of all these complications, I have now decided to only show and report some of the modelling results. For the others, the values are very uncertain and it would be misleading to mention the total epidemic deaths coming out of the modelling here.
It is important to note that I am not claiming any accuracy for these predictions. This approach is just one way of estimating future trends based on available data.
However, in China it appears the present phase of the pandemic is over. At the bottom of this page there is a study which shows that this approach was giving estimates of total deaths within a factor two of the final total, even at the early stage of the pandemic.
I first give here the results for the UK, followed by those for other selected countries. In all cases I have used the Worldometer data for total deaths to date, as a function of time. For the UK, these are the daily figures from the DHSC that appear widely in the media. In these figures, for a death to be attributed to Covid-19, there must have been a positive Covid-19 test. These figures hence do not include deaths from Covid-19 where a test had not been performed, which may have occurred most often in care homes and the community. Recently it has emerged that the positive test result could have been well before the death occurred, which raises the obvious possibility that some of the deaths may have been unconnected to the earlier infection.
The plot below shows the Worldometer data for the UK in terms of deaths to date (left logarithmic axis) and daily deaths (right linear axis). It also shows the results of fitting the second equation given above to the daily death values. As the daily figures show considerable fluctuations, with lower values at weekends, I now am now following many others in showing values averaged over the previous week (i.e. a rolling average).
The UK total deaths peaked around day 35 and have been declining ever since. The reported values suggest there have been 3 rather different phases to the UK epidemic so far, with the current one starting at about day 20, around the time the lockdown started. Alternatively this may be a further illustration of the vagaries and delays in reporting deaths.
For the UK, the blue modelled curve now reaches a total of about 46,000 deaths, which is now slightly lower than the actual figure. This is because the actual deaths are declining more slowly than the modelled curve. This "fat" tail, as some others have referred to this phenomenon, is shown for some, but by no means all of the other countries given below.
On the plot below, also shown are the separate weekly registered death figures reported by the ONS (England) added to the numbers from the corresponding separate bodies for Scotland and Northern Ireland. These are all deaths where Covid-19 is mentioned on the death certificate, and are counted by date of registration of the death. These numbers are higher than those reported by the DHSC, which only counts those with positive Covid-19 tests. The registered death figures are also accompanied by information on all deaths and how they compare with the long-term average. These show a significant number of unaccounted for excess deaths where Covid-19 is not on the death certificate. These could be from other causes, e.g. heart attacks, cancers and strokes, which may be have increased due to the reallocation of NHS resources to handling the Covid-19, to the detriment of other forms of care. Alternatively, or additionally, there may well have been some deaths caused by Covid-19 but not mentioned on the death certificate. On 14 June, there had been about 65,000 excess deaths in the UK since the start of March 2020 - substantially higher than the 42,000 Covid-19 deaths reported by DHSC on that date. Since then the total number of deaths has been slightly lower than the 5-year average, so the excess deaths are declining slightly. It is unclear why this is happening.
This all goes to highlight, even today, the difficulty in being sure of how many people have died during a pandemic.
As with nearly every country shown here, lockdown measures are now being eased in England and more gradually in Scotland, Wales and Northern Ireland. The key question is whether or not this will produce an upturn in infections rates which then feed into the death figures in due course. The 'fat' tail effect, mentioned above, is already in evidence and may become more pronounced if further local or even more widespread outbreaks occur following the loosening of lockdown restrictions.
Spain, Italy and France
I show below the Worldometer data for deaths to date for various different countries as a function of time. The curves show the fits to the available data using the principle of least squares, as for the approach used for the UK above. The plot below shows the numbers of new deaths per day.
The epidemics in Italy and Spain are now well advanced and the daily deaths are now at low levels. Each of the dashed curves show one of the equations described at the top of this page fitted to the daily death values. In Spain, daily deaths have been in single figures for some weeks now, and even now in Italy daily deaths are often in single figures. Both countries have been easing their lockdown restrictions for some time now, without any marked increases in deaths. However there is a recent well publicised sharp increase in cases in some parts of Spain, especially those in the north east of the country. This increase in cases has yet to show up in the death figures.
Unlike the UK, for these two countries, the modelled curves fit the daily death figures quite closely, both before and after the peaks in daily deaths. There is currently no sign of 'fat' tails occurring in Italy and Spain.
For France, the numbers have been very erratic; on the plot below an average over the last 3 days is shown. Even with this averaging there are many fluctuations and the initial rise in deaths doesn't fill very well to the modelled curve. There was a large increase being caused by the inclusion of care home deaths on day 36, but a decline is now well underway, with the total being around the 30,000 mark, very similar to Italy and Spain. The daily figure went negative for some strange reason on day 82! The daily figures have now been below 20 for several days in a row, despite recent lockdown easing. As with Italy and Spain, the modelled curve fits the recent numbers of deaths quite closely, and there is no sign of the pronounced 'fat' tail shown in the UK figures.
The USA numbers do not fit well to a single curve. Excluding the values before 47 days from the first death, the modelled overall total is around 130,000. Although this is far higher than in the European counties shown above, the US population is correspondingly much larger (330 million), than those of the European countries shown here (for example, the UK population is 68 million). This is shown more clearly in the first graph on this page which shows the deaths in each country as a percentage of the total populations.
Having been in decline for some time, the daily figures are now definitely increasing again, no doubt following the well publicised large rise in reported cases in many states. This is presumably connected to the premature easing of the lockdown in many states. It now looks like there are effectively two Covid-19 epidemics or waves in the USA, with the second one giving significant rising values in the daily death rates.
Sweden and Germany
Unlike all the other countries featured on this page, Sweden has not had a major lockdown. Hence a comparison with other countries that have much higher levels of restriction in interesting. Sweden has a significantly smaller population (about 10 million) and a lower population density than any of the other countries given here. The daily figures (3-day smoothing) show a strong weekly cycle. The fitted curve suggested that the peak in the daily figures was reached on about day 45 and that a slow decline is now underway. The modelled total epidemic deaths is around 5,000. There is currently concern in Sweden over the current relatively high death rate per head of population and extended period of the slow decline in daily numbers. As with the UK and the USA, the Swedish figures are showing the 'fat' tail effect.
In Germany, the approach has been completely different with a huge amount of testing and follow-up of those infected, combined with a major lockdown. At present, the fitted curve has a final total death toll of about 9,000, in stark contrast to the UK, Italy, Spain and France. The daily figures (3-day smoothing) again show a pronounced weekly cycle. The fitted curve shows the peak in the daily numbers was reached on about day 35 and are now tailing off well, despite some easing of lockdown rules. There is no sign of a 'fat' or extended tail developing in Germany.
With the reportedly dire Covid-19 situation in Brazil much in the news currently, I have just taken a look at their figures, which are shown in the plot below. Since around day 35, the numbers for total deaths to date appear to follow quite closely the curve fitted, and before that the differences were relatively small. Extrapolation of this curve currently gives an estimate of about 180,000 for the total number of pandemic deaths, which is more than the USA. The fitted curve also suggests the peak death rate has now been reached, so that the numbers dying each day are beginning to decline. However the reported numbers are showing no real sign of a decline, with even the hint of a small rise in daily deaths. It is clear the pandemic is nowhere near under control in Brazil, much as appears to be the case in the USA.
There must also be some caveats about the accuracy of the official numbers for this country, which may be significantly under-reported.
For China the present phase of the pandemic is apparently over. It can therefore be used as useful test case for the modelling approach described above. The plot below the available data for China up to 50 days since the first death. Note immediately how closely the 2-parameter equation given above fits to the measurements, over almost the entire course of the pandemic. In this country at least, the equation seems to be more than plausible - it provides a good fit to the entire pandemic.
I have then tested how accurately the modelling approach predicts the final death toll, based on different amounts of the available data. The results are shown below. This graph shows how the predicted total deaths vary depending on the date at which the modelling is performed. All predictions were within a factor of two of the final value. This is unlike my experience with any of the other countries modelled to date. In all these cases, the predictions for the final death toll have been varying by large amounts, indicating the approach is not providing meaningful values.
It is however surely a coincidence that the prediction, performed on only the first six days of data (I excluded the values below 4 days from first death), is almost spot on the final figure.
|© All plots copyright Stephen Burch|