COVID-19 Data Analysis Update - 1. April 2020 - Special report: Statistics of the death rate


Given the tragedic situation of recent increases in daily deaths reported worldwide due to covid19, in this special report we would like to study the dynamics of the deaths and its relation to the number of infections. We evaluate the data provided by WORLDOMETER ( on a daily basis, concerning the numbers of persons infected, deceased from the Covid 19 virus in different countries. We exclude any liability with regard to the quality and accuracy of the data used, and also with regard to the correctness of the statistical analysis. The evaluations of the different growth phases represent our purely personal opinion.

The data are usually based on information provided by national governments or authorities. The number of cases reported may be significantly lower than the number of people actually infected, and these differences may be different in different countries. However, as long as the ratio of reported cases to actual cases remains reasonably constant, statistical trends for actual cases can be derived with all due caution from analysis of the evolution of reported case numbers. In this context, however, we must point out that changes in the test system may lead to fluctuations in reported cases that have no equivalent in actual case numbers. It is also possible that some governments may manipulate their data for political reasons and therefore the reported case numbers may not necessarily correspond to the positively tested cases.

Certainly, at this stage all statistical predictions are subject to great uncertainty because the general trends of the epidemic are not yet clear and we do not know to what extent developments in countries where the epidemic appears to be already largely controlled can be transferred other countries. In any case, the statistical trends that we read from the data are only suitable for predictions if the measures taken by the respective governments and authorities to contain the pandemic remain in force and are being followed by the respective populations. We must also point out that, even if the statistics indicate that the epidemic is under control, we may at any time see a resurgence of infection figures until the disease is eradicated worldwide.

Statistical analysis

For each of the countries we survey, we distinguish different periods of pandemic development based on the respective growth rates for the number of infections recorded. We believe that we can establish general statistical regularities. At the beginning, the growth rate is typically extremely high % (dark red) , but then weakens. In the final saturation phase, the growth rate has become so low % (turquoise) that the development of the epidemic is essentially under control. The various countries are currently at different stages of development. In countries in which the growth rate is still very high, as is currently the case in Germany, it must be expected that a saturation phase will only occur after much higher case numbers.

The first graph shows, in the 2-based logrithmic scale, the numbers of infected, and deceased persons over time. The next graph shows the number of daily new deaths.

The epidemic fact that the infection typically spreads rapidly with a time lag of 6-7 days leads to an exponential stretch of the number of infections. Therefore, the third graph plot the relation between the logarithmic growth rate and the log scale for the infection numbers. We then see that it depends on the country to what extent linear regression is useful for estimating the expected infections in total. In fact, there is a clear sign in China, South Korea, Iran, Italy, Poland, Portugal that the logarithmic growth rate depends negatively and linearly on the log-scale of the infections.

For predicting the further development of the epidemic in each country, it seems important to determine the point when the growth rate falls below the yellow line. If the figures from China (although these data are probably systematically distorted) and South Korea can be transferred, the final number of cases will be about two and a half times higher than at this point in time. Of course, this is only a very rough estimate with many uncertainties, and not a reliable prognosis. Before this point in time, it is probably not possible to make any reasonably reliable forecasts at all at present. And it is also currently not clear to what extent the findings from East Asian countries can be generalized to others. In particular, the actual development will also depend on the measures taken or to be taken to contain the epidemic and their implementation and compliance by the population.

So we do not simply extrapolate the current growth rates in order to predict, for example, how quickly the number of infections will double. If current growth rates were maintained, practically the entire population would be infected in most countries within a short time. Instead, we try to capture regularities in the change in the growth rate. In general, it seems to be the case that after a strong initial phase, the growth rate slows down and the epidemic finally passes into a saturation phase, where there are relatively few new infections. Our statistical goal is to estimate when this will happen and what the total number of infections will be by then.

We would like to point out some aspects of the data situation that have emerged from our analyses. At the beginning of the epidemic, strong fluctuations and deviations from the regression line can be seen in every country. This is simply due to the small case numbers. In the Chinese data, you suddenly see a sharp jump in the middle. However, this does not seem to be due to such a large sudden increase in the actual number of cases, but rather to a change in data collection.

The test density and the classification of the test results vary greatly from country to country, so that the numbers of infected persons cannot be easily compared. Many infected persons are therefore not recorded, and the proportion varies from country to country. It is also possible that in certain countries the official data is falsified by political manipulation. Even the number of deaths reported may vary between countries, as patients with pre-existing conditions may be diagnosed differently as to the cause of death. Perhaps in some countries only those who have died in hospitals are recorded. In addition, when interpreting the statistical data, it should be borne in mind that there is typically a longer period of time between infection and death of a patient. Current death figures are therefore correlated with past and less with current infection rates. We also see sudden increases in death rates in some countries, perhaps because their medical systems get overstrained.

Death rate is increasing !

In Figure 4 we show the temporal behaviour of the mortality rate, which is defined as the number of deaths divided by the number of infections. The graph shows a trend that the mortality rate actually increases over time. Worse still, Figure 5 shows that the death rate generally also increases when compared to the increase in the number of infections. This relationship seems to be linear after the number of infections has reached a threshold, say 1000 cases. In some countries the increase in the death rate starts at higher thresholds, for example in Australia and Canada at 2500, in Turkey at 8000 and in the USA at 30000 infections. Two exceptional cases are Japan and the Philippines, where the mortality rate has been falling over the last two weeks.

One possible explanation for the rise in the mortality rate is that, because of the very rapid spread of the coronavirus, above a certain threshold of infections, the number of serious cases exceeds the capacity of the public health system, resulting in more deaths.

For the purpose of quantitative analysis, in Figure 6 we present the number of deaths as a function of the number of infections on logarithmic scales. The linear regression line shows that this relationship could be described by the following formula \[\begin{equation} \log_2(D_t)=k\star \log_2(I_t)+\log_2(a)+\epsilon_t⇔D_t=2^{\epsilon_t}\star a\star I_t^{k-1} \end{equation}\] with parameters \(a,k>0\), where \(D_t\) is the number of deaths, \(I_t\) that of infections and \(\epsilon\) is the linear regression error. In Table 1 we show the coefficients \(\alpha,\beta\) and the standard deviation \(c\) of this error \(\epsilon_t\), which are generated by the linear regression method. In most countries the coefficient \(k\) is greater than 1, so the death rate \(d_t:=\frac{D_t}{I_t}\) can be calculated by \[\begin{equation} d_t=2^{\epsilon_t} \star a\star I_t^{k-1}, \end{equation}\] which means that statistically, the death rate is an increasing function of the number of infections. A tragic consequence is that we would see more and more deaths in the USA and Europe in the coming days.

Country a k c Sh-W normality test probability K-S normality test probability Present death rate
WORLD EXCLUDE CHINA 0.0016895 1.2593817 0.1898538 0.0646 0.4253 0.051
AUSTRALIA 0.0009063 1.1833778 0.0782029 0.7832 0.9715 0.005
AUSTRIA 0.0000110 1.7362178 0.3610579 0.6896 0.6107 0.014
BRAZIL 0.0002692 1.5647822 0.0916846 0.9610 0.9321 0.035
BELGIUM 0.0000556 1.7295622 0.2224599 0.1796 0.7777 0.059
CANADA 0.0053322 1.0835102 0.1117241 0.7016 0.9873 0.012
CHINA 0.0067111 1.1431221 0.3133082 0.0001 0.0617 0.041
CZECH 0.0000000 3.1490033 0.3920885 0.8826 0.7118 0.011
DENMARK 0.0000000 2.9387745 0.3920923 0.7512 0.8994 0.033
FRANCE 0.0008230 1.4042877 0.1893205 0.0783 0.5622 0.071
GERMANY 0.0000403 1.4712627 0.3067872 0.6757 0.9342 0.012
GREECE 0.0003992 1.6287770 0.0889657 0.4840 0.9299 0.036
INDIA 0.0082561 1.1611972 0.0676775 0.5482 0.8966 0.029
INDONESIA 0.0247262 1.1766259 0.0338100 0.4785 0.8562 0.094
IRAN 0.0020522 1.3399915 0.3086481 0.7055 0.9088 0.064
IRELAND 0.0000001 2.5210005 0.1796434 0.9591 0.9500 0.025
ISRAEL 0.0000007 2.0181714 0.3749841 0.8264 0.8885 0.004
ITALY 0.0022059 1.3435620 0.1062129 0.0254 0.2066 0.119
JAPAN 1.1203226 0.5129163 0.0559246 0.0010 0.1217 0.024
MALAYSIA 0.0000010 2.2177395 0.2706971 0.0079 0.6388 0.015
NETHERLANDS 0.0002296 1.6214717 0.0939822 0.0065 0.6186 0.086
NORWAY 0.0000076 1.8027383 0.2623545 0.1647 0.6770 0.009
PHILIPPINES 2.7108119 0.4569992 0.0487504 0.5661 0.9752 0.042
POLAND 0.0020827 1.2489343 0.1768069 0.3209 0.7781 0.017
PORTUGAL 0.0001239 1.5884288 0.1481384 0.7977 0.8438 0.023
SOUTHKOREA 0.0003187 1.3845067 0.4175487 0.0156 0.6263 0.017
SPAIN 0.0015730 1.3513214 0.1255115 0.0217 0.4734 0.090
SWEDEN 0.0000001 2.5646812 0.2630703 0.2062 0.7839 0.048
SWITZERLAND 0.0004137 1.3968428 0.2813245 0.4549 0.9256 0.027
TURKEY 0.1129977 0.7878164 0.1531105 0.8024 0.9805 0.018
UK 0.0031889 1.3056729 0.1988885 0.0317 0.6148 0.080
USA 0.0290680 0.9490968 0.3181390 0.6228 0.9937 0.024

What needs to be explained are the different death rates in the different countries. In some countries, such as Germany or the Scandinavian countries, they are relatively low, while in Italy, Spain, France and the Netherlands they are particularly high. The reasons for these differences are probably many and varied. Firstly, infections may affect different population groups, possibly mainly elderly people in Italy, but returning skiers in central and northern Europe. Secondly, it is possible that in some countries infection rates are significantly underestimated. Thirdly, it is possible that not all deaths, for whatever reason, are recorded in all countries.

Some more remarks

The numbers of patients recovered are also likely to be inaccurate, as hospitals often do not report releases to the authorities and those who have recovered at home will usually not report either. It is therefore possible that the epidemic is already under control before the official number of active cases is zero.

For these and other reasons, the figures officially reported from China, where the epidemic is supposedly under control, must be used with caution when forecasting developments in other countries.

The spread can also vary widely, as the social contact networks through which infections occur can be very heterogeneous. In South Korea, the virus appears to have spread mainly within a religious sect, within which contacts were very high, so that it spread rapidly there, while contacts with the outside world were much lower, so that the infection could essentially be confined to this group. In China, the epidemic was essentially limited to one province, Hubei, by strictly forbidding and preventing all contacts with the outside world. In the Scandinavian countries we see two peaks in the number of infections, which indicates that there have been two different waves of spread. Either the infection, similar to South Korea, initially spread only within a certain group and only later affected other population groups, or there was a second wave of infection independent of the first. In other countries, festivals, sports competitions or other major events may also have caused a sudden worsening of the epidemic. Network propagation models must therefore pay particular attention to the heterogeneity of networks.

Covid19 in countries