The purpose of this post is to explain the details behind the implementation of the ilu function, my work during this first period of GSOC program. The files involved are:
- src/ilu0.cc
- src/iluc.cc
- src/ilutp.cc
- ilu.m
- https://edu159@bitbucket.org/edu159/gsoc2014-edu159
--> src/ilu0.cc
This file contains the implementation of ILU0 algorithm, the easiest one. In this version the zero-pattern of the input matrix is not modified so it is known the final structure of the output matrix. That simplifies things. For the milu=['col'|'row'] option, it is needed to implement both the IKJ and JKI versions of the algorithm to efficiently compute the compensation of the diagonal element with dropped elements. I managed to do both in the same function, just changing a few lines of code. Lets use Matlab's documentation example:
Example 1:
A = gallery('neumann', 1600) + speye(1600);
setup.type = 'nofill';
setup.milu = 'row';
[L,U] = ilu(A,setup);
e = ones(size(A,2),1);
norm(A*e-L*U*e)
ans = 1.4660e-14 (Very low, good)
The following little function can be used, when milu = ['row'|'col'] to check that all the columns/rows preserve its sumation (not only with ilu0 but with iluc and ilutp). Just run it after calling ilu in any form.
benchmark/check_sums.m (It can be found here in the repo)
function check_sums (A, L, U, milu)
b = L * U;
dim = 1;
if (milu == 'row')
dim = 2;
endif
c = sum (b, dim);
d = sum (A, dim);
v = abs (c - d);
num_zeros = length (find (v > sqrt (eps)));
printf('Number of rows-columns not well compensated: %d\n', num_zeros);
if (num_zeros > 0)
v (find (v > sqrt (eps)))
endif
endfunction
NOTE: I have found in Matlab 2013a that the row and col sumation does not work well always, and the row and column compensation fails for ilutp and iluc. I will show an example later.
--> src/ilutp.cc
This algorithm is the trickiest one due to pivoting, and has caused me more than one headache during its coding because it is not well described in Saad's book, just a few indications. I have found here several bugs in Matlab's 2013a implementation that make me a bit reticent about trusting results correctness.
Error 1
A = sparse ([3 4 4 3 -1.1; 2 0 8 9 2.2; 2 1 9 9 1; 3.2 10 2.3 2 4.5; 9 2 6 2 1]);
setup =
type: 'ilutp'
milu: 'col'
droptol: 0.2000
thresh: 0
udiag: 0
>> [L, U, P] = ilu(a,setup);
sum(A(:, 2)) => 17
sum(L*U(:, 2) => 14.4857
Clearly the sum of the second column is not preserved :/.
Error 2
A = sparse([3 1.5 0 0 1.44; 0 1 0 0 -2;0 0 8 0 0; 0 2 0 2 -4.5; 0 -1 0 0 1]);
setup =
type: 'ilutp'
milu: 'col'
droptol: 0.5000
thresh: 0.2000
udiag: 0
>> [L, U, P] = ilu(a,setup);
The output is:
U =
3.0000 1.5000 0 0 0
0 0 0 0 0
0 0 8.0000 0 0
0 0 0 2.0000 Inf
0 0 0 0 -Inf
L =
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 Inf 0 1 0
0 0 0 0 1
What are those Inf doing there? Clearly the are not detecting correctly 0 pivots.
Error 3
A= sparse([3 1 0 0 4; 3 1 0 0 -2;0 0 8 0 0; 0 4 0 4 -4.5; 0 -1 0 0 1]);
setup =
type: 'ilutp'
milu: 'row'
droptol: 0
thresh: 0
udiag: 0
>> [L, U, P] = ilu(a,setup);
Output:
L =
1.0000 0 0 0 0
1.0000 0 0 0 0
0 0 1.0000 0 0
0 1.0000 0 1.0000 0
0 -0.2500 0 0 1.0000
That 0 cannot be there. By construction L has to be a lower unit triangular matrix and that zero element spoils the L*U product. Again WRONG.
I have encountered more issues when testing Matlab using some testing matrices with 2000x2000 and 5000x5000 dimensions. With them my output is not the same as Matlab's (nnz of L and U are different from Matlab's), but taking into account the errors I found, I trust the most my version and not theirs. BTW in my case the rows and columns sums were preserved, theirs not. Obviously I have checked that those examples behave correctly in my code detecting 0 pivots
A similar example can be run as with ilu0:
Example 2:
A = gallery('neumann', 1600) + speye(1600);
setup.droptol = 1e-2;
setup.type = 'ilutp';
setup.thresh = 0.5;
setup.milu = 'row';
[L,U] = ilu(A,setup);
e = ones(size(A,2),1);
norm(A*e-L*U*e)
ans = 2.5170e-14 (Nice)
Pivoting: It worths to mention how pivoting is performed in that algorithm. When milu = 'row' the U matrix is column permuted (IKJ version used) but when milu=['off',|'col'] L is the permuted one and it is row permuted (JKI version used). Both algorithms share a lot of similarities and the code is designed to work in one version or another depending on milu option. That way code duplication is avoided. That was one of my primary fears when I realized that both versions were needed to attain Matlab compatibility.
--> src/iluc.cc
This is the file containing the crout version of ILU. This version is an enhancement of pure IKJ and JKI variants of gaussian eliminations. At iteration k the k:n section of k column and k row is computed. The enhancement is noticed in the execution time for the same input matrix. The following example is a comparison between my versions of ilutp and iluc:
For a 2000x2000 matrix ( I have not included this matrix in the repository due to it size):
With setup.droptol = 0.01 and setup.milu = 'off'.
Octave:
ilutp --> 12.3458 seconds
iluc --> 6.31089 seconds
Matlab:
ilutp --> 12.868686 seconds
iluc --> 7.498106 seconds
That is just to illustrate the performance of different versions.
NOTE: In iluc the dropping strategy for elements in U (stored as CRS) is to drop the element aij if (abs(aij) < droptol * norm(A(i, :))). For the L part (stored as CCS) aij is dropped if (abs(aij) < droptol * norm(A(:, j))).
Finally the numeric example:
Example 3:
A = gallery('neumann', 1600) + speye(1600);
setup.droptol = 1e-2;
setup.type = 'crout';
setup.milu = 'row';
[L,U] = ilu(A,setup);
e = ones(size(A,2),1);
norm(A*e-L*U*e)
ans = 2.5212e-14 (Nice)
That is all I wanted to show till now. I have written tests for the functions and adapted several ones from Kai last year work. However I want to add some more function-specific ones for validating results. The last thing pending is to place the source files inside the Octave source tree. I am not totally sure where they should go. On the other hand I have already started to work on ichol function and next week I'll report about my progress.
I know the post is a bit long but I think it is needed due to the poor verbosity I had through the blog during this period. I am aware of that (Jordi pointed me out a few days ago) and I will take into account for the following weeks.
Regards,
Eduardo