**Pressekontakt:**

Jana Gregor

Pressebeauftragte

**Anschrift:**

Inselstraße 22

04103 Leipzig

Germany

**Telefon:**

+49 (0) 341 - 9959 - 650

**Fax:**

+49 (0) 341 - 9959 - 658

**E-mail-Kontakt**

Aktuelle Pressemeldungen

## Episode 10 - Wolfgang Hackbusch and the world of hierarchical matrices (24.11.2021)

In numerous physical, chemical, biological and mathematical problems, equations arise that require a fast and exact solution. Examples from the natural sciences include heat transport, flow behavior of fluids or the motion of stars and galaxies due to gravity. Hierarchical matrices are an excellent mathematical method to solve large systems of equations with a vast number of unknowns quickly and efficiently.

From 1999 until his retirement in early 2014, the research group "Scientific Computing" headed by our director Prof. Dr. Dr. h. c. Wolfgang Hackbusch developed, among other things, hierarchical matrices and the approximation and arithmetic of tensors. The scientists consistently applied these techniques to differential and integral equations as well, making them more efficient to solve.

Wolfgang Hackbusch, born in 1948, studied mathematics and physics at the Universities of Marburg and Cologne. After completing his habilitation in Cologne with a thesis on multigrid methods, he took up a professorship at the Ruhr University in Bochum in 1980. In 1982, he transferred to the professorship of Practical Mathematics at the Department of Computer Science at the Christian-Albrechts-University of Kiel. From 1999 until his retirement, he was a Scientific Member and director at the Max Planck Institute for Mathematics in the Sciences in Leipzig and Honorary Professor of the University of Leipzig.

Wolfgang Hackbusch is a founding member of the Berlin-Brandenburg Academy of Sciences and Humanities and a member of the National Academy of Sciences Leopoldina. He has received numerous honors and awards for his outstanding scientific achievements, including the DFG's Gottfried Wilhelm Leibniz Prize (1994) and the Brouwer Medal of the Dutch Mathematical Society (1996). In 2000, he was awarded an honorary doctorate from the University of Bochum. For his pioneering contributions to numerical mathematics, the Carl Friedrich von Siemens Foundation honored him in 2020 with the Heinz Gumin Prize for Mathematics, the most highly endowed mathematics prize in Germany.

*Hierarchical matrices can be demonstrated using image compressions. Experience for yourself!*

### Hierarchical matrices

#### Systems of equations and matrices

In numerous physical, chemical, biological, and mathematical problems, equations occur, whose solution is sought. Examples from the natural sciences include heat transport, currents, or the motion of stars and galaxies due to gravitation. The relations are typically formulated by one or more differential or integral equations, equations in which different derivatives or integrals of the sought and or integrals of the sought and given quantities appear.

Unfortunately, such equations can be solved exactly only in exceptional cases, e.g., because the scientific processes refer to *all* points of space or time, which is an infinite quantity. This *continuous* initial problem is therefore *approximated* by a *discrete* problem with finitely many points of space or time, e.g., in heat distribution one does not compute the solution for all points of a hot plate, but only for a certain number. However, the errors due to this *discretization* are small and can be further reduced with more points. Besides such errors due to discretization, there are other sources of errors, such as measurement inaccuracies.

The numerical procedures for discretization ultimately lead to a system of equations of the form

$$ \begin{align*} a_{11} \cdot x_1 + a_{12} \cdot x_2 & + a_{13} \cdot x_3 + \cdots = b_1 \\ a_{21} \cdot x_1 + a_{22} \cdot x_2 & + a_{23} \cdot x_3 + \cdots = b_2 \\ a_{31} \cdot x_1 + a_{32} \cdot x_2 & + a_{33} \cdot x_3 + \cdots = b_3 \\ \vdots & \end{align*} $$

Such *systems of equations* are already known in a simple form from school. Here \(x_1, x_2, x_3, \cdots\) are the *unknowns* which
have to be determined, for example the heat distribution at the previously selected points of the hotplate. The values \(a_{ij}\) represent *coefficients* that
characterize the relationships between the unknowns, with a larger value usually representing a greater dependence between the unknowns. For example, heat at
closely spaced points on the hot plate is more interdependent than at points farther away. Finally, the numbers \(b_i\) are constraints that appear in the
original equations.

For numerical treatment, the coefficients \(a_{ij}\) are of primary interest. These are typically arranged in a tabular scheme, a *matrix*:

$$ \begin{pmatrix} a_{11} & a_{12} & a_{13} & \cdots \\ a_{21} & a_{22} & a_{23} & \cdots \\ a_{31} & a_{32} & a_{33} & \cdots \\ \vdots & \vdots & \vdots & \ddots \\ \end{pmatrix} $$

The solution of the system of equations is now obtained by transforming the coefficients of this matrix. Thus, in school mathematics, one learns to transform one of the equations according to an unknown and to substitute the result into another equation. One has *eliminated* one of the unknowns by this. If one continues this procedure, one receives at the end the solution value for the only remaining unknown. This value can then be substituted into the penultimate equation to determine the second unknown, and so on. This method is also called *Gauss elimination*, after Johann Carl Friedrich Gauss (1777 - 1855). However, the origins of this method can be found in Chinese calculus books from over 2000 years ago.

Those who think back to such school exercises may also remember that the calculation was very lengthy. And in the process, typically only three, four or five unknowns were sought. In the equations that have to be solved in the scientific and technical environment, it is not uncommon to find hundreds of millions of unknowns. For such orders of magnitude, Gaussian elimination is far too costly, even on the fastest mainframe computers.

However, Gauss had already developed a method that could be carried out "half asleep" with considerably less effort, as he remarked in a letter to a colleague. Here, approximations to the solution are calculated step by step. The procedure stops as soon as the solution is sufficiently exact. Here, too, errors are allowed. But if these errors of the solution procedure are of the same order of magnitude as the discretization error or the errors due to measurement inaccuracies, the quality of the calculated solution does not change. In addition, one cannot increase the accuracy of the solution, if one makes a smaller error with the solution procedure than with the discretization. One would only waste computing time.

A property of matrices, which appears in many equations, also helps to determine the solution much faster: most of the coefficients are zero. Such matrices are called *weakly populated*. This allows a large part of the computational steps to be omitted. Combining this with other properties that arise from differential equations and integral equations, one can construct methods that are extremely efficient at arriving at a solution even for very large systems of equations.

#### Fully populated matrices

Unfortunately, weak population does not apply to all relevant systems of equations. For such *fully populated* matrices, the problem then already arises
that the cost of storing the coefficients in a computer grows very quickly. Thus, for a system of equations with \(n\) unknowns \(n^2\), one needs many coefficients for the matrix. That is, doubling the number of unknowns requires four times the storage space. The usual computer representation of eight bytes per digit results in the following memory consumption for an increasing number of unknowns:

Number of unknowns | Memory consumption |
---|---|

10 | 800 bytes |

100 | 80 kByte |

1,000 | 8 MByte |

10,000 | 800 MByte |

100,000 | 80 GByte |

1,000,000 | 8 TByte |

10,000,000 | 800 TByte |

The largest computer system currently available to the Max Planck Society, containing 1784 individual computers, has a total storage capacity of 566 TBytes. A matrix of size \(10,000,000 \times 10,000,000\) could thus no longer be stored!

### Matrix compression

As we described earlier, errors are inevitable when dealing with scientific or engineering equations, and thus any solution to the system of equations is also only an approximation to the *true* solution. As long as one does not increase this error during the calculation, approximations are thus also allowed during the individual calculation steps. The same is true for storing the matrices. We look for procedures with it, which produce an *approximation* of the matrix, but possess thereby a clearly smaller memory requirement.

To illustrate such a method, consider the following image:

The image consists of 8 rows and 8 columns and each pixel has a certain gray value. In computer images, natural numbers in the range of 0 to 255 are typically used for such gray values. Accordingly, we can also consider the image as a matrix:

Now, if we use the first column and the first row of this matrix as vectors, we can multiply them, multiplying each number of the column by each number of the row to get a new row. The result is then:

We could thus reconstruct the original image using one column and one row of the original matrix. However, the memory requirement decreases from \(8 \times 8 = 64\) numbers to \(8 + 8 = 16\) numbers.

In the following image, the number in the last position at the bottom right has been changed from 64 to 63. However, we still use the first column and the first row of the matrix. This time, however, their product does not give the exact image, but represents only an approximation, which differs in this one place. This difference is so small, however, that it is not noticeable when looking at it:

Original | Approximation |

But the gain in memory requirements remains with the approximation!

This approximation method is called *cross approximation* because the columns and rows generally represent a cross, such as when the middle column or row is used in each case. For more complex images or matrices, not only one pair of columns and rows is normally selected, but several and their individual products are added together.

An example of this is the image of our institute:

It consists of 500 columns and 348 rows. In the following, the cross approximation was applied with a different number of column-row pairs to create an approximation of the image.

200 column-line pairs |
100 column-line pairs |

50 column-line pairs |
10 column-line pairs |

The memory requirement reduces to 97% (200 pairs), 49% (100 pairs), 24% (50 pairs) and 5% (10 pairs) according to the number of column-line pairs. However, the approximation already shows significant compresseion artifacts at 200 pairs.

A much better method is the *singular value decomposition*. Here, column-row pairs are also used, but these do not come directly from the image (or matrix), but are combinations of several columns or rows of the image. Because of this, the quality of the approximation is much better than with the cross approximation. One can even prove that there is no better method which works exclusively with columns and rows.

The results of compresseion using singular value decomposition with the same number of column-row pairs are given below:

200 column-line pairs |
100 column-line pairs |

50 column-line pairs |
10 column-line pairs |

Especially with the first two approximations, the difference from the original image is very small. And even with 10 column-line pairs, the institute building is still visible.

Unfortunately, the calculation of the singular value decomposition is very time consuming and thus only feasible for small matrices.

In the following, we now revert to the cross approximation, which is much faster to compute. However, the entire image is not approximated in one step, but the image is divided into individual regions and the approximations are performed separately for each region.

Original |
Approximation |

The difference between the two images is hard to make out (look for the clouds). Only a maximum of five column-line pairs per image area were used!

However, there was an error bound in the calculation of the approximation. If this bound is exceeded with five column-line pairs compared to the original image area, then *not* the approximation of the cross approximation is used, but the original image. In the following, the areas of the image are outlined in green in which the cross approximation was successful. Areas outlined in red, on the other hand, could not be approximated.

We can make two main observations here. First, the areas where the approximation was successful are characterized by no sudden changes. In particular, in the upper region of the image with the sky or clouds and the lower region of the road could be approximated well with a few column-line pairs. In contrast, the method fails in areas with many changes, such as the area of the facade or the leaves in the upper right.

Furthermore, the areas in which an approximation was successful vary in size. This also depends on the image content. The clouds extend over a larger area than the street, for example.

However, due to the high percentage of areas where the approximation was not accurate enough, the gain in memory requirements in this case is small and amounts to 19%, i.e., the approximated version still requires 81% of the original memory. If the approximation is applied to the same motif with higher image resolutions, however, the memory requirement also decreases. At a resolution of 1000 columns and 696 lines, the requirement has dropped to 71% and at 2000 columns and 1392 lines to 51%. In the latter case, almost all areas of the image can be approximated by the cross approximation, as demonstrated in the figure below.

1000 x 696 |
2000 x 1392 |

That the effectiveness of the approximation increases with increasing matrix size is another typical property of the method.

#### Hierarchical matrices

After the excursion into image approximation, we now return to matrices as obtained by mathematical, scientific, or engineering problems.

The method of blockwise cross approximation can be applied analogously in this case, i.e., the matrix is decomposed into subregions in which the cross approximation determines an approximation. The number of column-line pairs is again constrained by a maximum number and an error bound for the approximation is specified.

The result of such an approximation for a matrix with 978 columns and rows is shown in the following figure:

The darker the respective block is, the fewer column-line pairs were needed. You can clearly see that the diagonal area, i.e., the blocks along the connection from top left to bottom right, can be approximated poorly, if at all. In contrast, the further away from the diagonal, the better the cross approximation works.

The memory requirement could be reduced to 39% for this matrix and is 3 MByte.

If we look at larger matrices for the same initial problem, the effectiveness of the approximation increases significantly. For 3906 columns and rows, the memory requirement dropped to 16% (19 Mbytes), for 15617 columns and rows to 5% (101 Mbytes), for 62466 columns and rows to 1.7% (521 Mbytes), and finally for 249858 columns and rows to 0.5% (2450 Mbytes).

In the last case, one would need about 500 GBytes to store the matrix with all coefficients without compresseion. This only succeeds on correspondingly large computing systems. In contrast, the approximation can easily be stored in RAM on a standard PC.

3906 x 3906 |
15618 x 15618 |
62466 x 62466 |

If you look at the matrix sizes, you will notice that the number of columns or rows increases by a factor of about 4 in each case. The memory requirement, on the other hand, grows by an ever decreasing factor: from 978 to 3906 by a factor of 6.3, from 3906 to 15618 by a factor of 5.3, from 15618 to 62466 by a factor of 5.2, and from 62466 to 249858 by a factor of 4.7. The growth of the memory requirement thus increasingly corresponds to the growth of the matrix size.

This *linear* dependence of the memory requirement on the size of the matrix, i.e., when the columns or rows double, the memory consumption also doubles, is usually the goal in numerical methods. In contrast, as described earlier, when all matrix coefficients are fully stored, the dependence is *quadratic*.

#### Calculating with Hierarchical Matrices

Matrices can also be used to perform normal arithmetic operations, such as addition or multiplication. The calculation rules here are similar to the rules for real numbers. Such arithmetic operations can also be used to develop even more efficient solution methods for systems of equations. If, for example, many different constraints are given for the original equations, i.e. many systems have to be solved, the matrix can be transformed with these operations and the result can be applied to each constraint separately. The transformation then needs to be performed only once.

The efficiency advantage of hierarchical matrices compared to normal full matrices is even greater for these arithmetic operations than for storage. For example, the time to compute the product of two fully populated matrices grows with the third power of the number of unknowns; thus, if these are doubled, one must compute eight times as long. For hierarchical matrices, on the other hand, this growth continues to be approximately linear, so the time required is only slightly more than twice as great.

In this case, too, approximation methods are consistently used to reduce the effort. Thus, the individual calculation steps are not performed *exactly*, but an error is allowed, which, however, is in the order of magnitude of the original error of the equation and thus does not further deteriorate the calculated solution compared to the exact solution.

### Conclusion

Ten years ago, it was still common in many areas of science and engineering to store fully populated matrices and to compute with them. However, as we have seen, there are tight limits to the methods in doing so in terms of memory space. These limits are also not easily overcome by these classical methods, especially since the growth of memory size is limited in computers. The limitations due to computation time for fully populated matrices are even greater.

In contrast, by representing the fully populated systems by hierarchical matrices, equations can be handled which contain many orders of magnitude more unknowns and in a fraction of the time. Systems of equations that used to require a mainframe computer can thus be solved on ordinary PCs.

In recent years, this advantage has led to a rapid spread of the use of hierarchical matrices and the methods used for them in many areas of science but also in the industrial sector.

### Interactive cross approximation

In the following, the cross approximation can be applied to arbitrary images. However, for simplicity, only grayscale images are generated. Also, the resolution of the images is limited, otherwise the calculation becomes too tedious. With large images, however, the calculation still takes a few, few seconds.

In addition to the image, the maximum number of column-line pairs, the error margin and the smallest block size can also be set. The output is then either displayed with or without indication of the approximated regions.

Number of column-row pairs | Margin of error | Smallest block size | Indicate approximation |
---|---|---|---|

## More information

## Contact

**Jana Gregor**

**Paul Heine**

**Jörg Lehnert**