Monday 11 August 2014

Soft pencils down.

It's been quite long since I posted here due to some personal situations. Anyway to sum up: I have finished ilu and ichol functions as I have planned in the beginning with great results.

Things done after mid-term evaluation:
  • Implementing ICHOLT and ICHOL0 algorithms.
  • Fixing several bugs in ILU algorithms and introducing some enhancements for big sparse matrices with verly low densities.
The files involved in ichol, within the repository, are:
  • src/icholt.cc
  • src/ichol0.cc
  • ichol.m
You can clone the code from the repo:
  • https://edu159@bitbucket.org/edu159/gsoc2014-edu15

Before going into the details of the algorithms' implementation, I want to point out some details about how ichol behave in MATLAB.


  1. In the real case the matrix must be symetric positive definite.  In the complex case the input matrix must be hermitian. That means: diagonal elements of the input and output matrix have to be non-zero, positive and real values. So that, at each iteration those conditions have to be fullfilled.
  2. If ichol is called just as L = ichol (A), Matlab ignores complex numbers and only work with their real part. Using L = ichol (A, setup) call, complex numbers are considered. Seriusly I do not understand why they do that and I have not followed that behaviour. Anyway if  to be 100% compatible I must change that, it would be only a line of code extra.

Details of implementation


 -->src/ichol0.cc

 In this file is located the implementation of ICHOL(0) algorithm. The zero-pattern of the output matrix is the same as the input one so it is known from the beginning how much  memory is needed to be allocated. The milu = ['on'|'off'] parameter indicates whether the dropped elements are added to the pivot or not (that keeps the colum sumation). 

I will show two examples, one that corresponds to a big matrix with a very low density and the one that used Kai last year in his blog.

Example 1:

  A = gallery ('poisson', 500);
  size (A)
  ans =

          250000   250000
  tic; L = ichol (A); toc;

  Elapsed time is 0.031718 seconds.
  density = nnz (A) / (size (A)(1))^2
  density =    1.9968e-05

  norm (A - L*L', 'fro') / norm (A, 'fro')
  ans =  0.0924207846384523

  norm(A-(L*L').*spones(A),'fro')./norm(A,'fro')
  ans =    2.28617974245061e-17


It can be seen that the product L*L' is quite different from A, but the product L*L' will match A on its pattern (that is expected for the ICHOL(0) algorithm. The execution time is just given to give an idea of how fast the code is. It is executed in a i7 2.4GHz.


Example 2:

 This example is taken from that post, written by Kai the past year. He faced problems with the michol option, obtaining different results from Matlab.

input:
A = [ 0.37, -0.05,  -0.05,  -0.07;
     -0.05,  0.116,  0.0,   -0.05;
     -0.05,  0.0,    0.116, -0.05;
     -0.07, -0.05,  -0.05,   0.202];

A = sparse (A);
opts.michol = 'on';
L = ichol (A, opts);

Octave: 
ans =

   0.60828   0.00000   0.00000   0.00000
  -0.08220   0.32014   0.00000   0.00000
  -0.08220   0.00000   0.32014   0.00000
  -0.11508  -0.18573  -0.18573   0.34607


Matlab:
ans =

    0.6083         0         0         0
   -0.0822    0.3201         0         0
   -0.0822         0    0.3201         0
   -0.1151   -0.1857   -0.1857    0.3461



Works fine.
  
 -->src/icholt.cc 

This file contains the implementation of ICHOLT algorithm. In this case the final structure of the output matrix is unknown. Therefore, a policy should be adopted for allocating memory. After trying different ways of doing that I end up using that one: 

  // max_len is the maximun length of ridx and data arrays for the output sparse matrix.
  max_len = sm.nnz ();
  max_len += (0.1 * max_len) > n ? 0.1 * max_len : n;

What is done here is just to increment 10% of the actual size of the ridx and data internal arrays of the output sparse matrix. But only if that amount is larger than the dimension of the input matrix (n). In other case the increment in size is just n. That policy seems to work very well in every case I tested and do not slow down the process at all due to reallocations.

Example 3:

icholt accepts a parameter for controling the sparsity of the ouput matrix called droptol. If droptol = 0 then the complete factorization takes place. If we increase that value the output matrix will become more sparse as more elements will be dropped. Taking the same matrix than in example 1:

 A = gallery ('poisson', 500);
opts.type= 'ict'

% Complete factorization
opts.droptol = 0;
tic;L = ichol(A, opts);toc;
Elapsed time is 46.0734 seconds. 
norm (A - L*L', 'fro') / norm (A, 'fro')
ans =    7.8595e-16


% droptol = 1e-2
opts.droptol=1e-2
tic;L = ichol(A, opts);toc;
Elapsed time is 0.0650802 seconds.

norm (A - L*L', 'fro') / norm (A, 'fro')
ans =  0.016734


% droptol = 1e-3
opts.droptol=1e-3
tic;L = ichol(A, opts);toc;
Elapsed time is 0.183416 seconds.

norm (A - L*L', 'fro') / norm (A, 'fro')
ans =  0.0021773


% droptol = 1e-4
opts.droptol=1e-4
tic;L = ichol(A, opts);toc;
Elapsed time is 0.589693 seconds.
norm (A - L*L', 'fro') / norm (A, 'fro')
ans =    2.4820e-04






As it can be seen, the higher the droptol parameter is, the sparser the matrix become. That lead to less execution times but on the other hand a higher error is obtained in the factorization. The complete factorization obviously have practically no error. Cool.


Location of source files inside Octave core


Now I've finished with the development of the algorithms, the final step is to integrate them into Octave core. For doing so I will create a subrepo of the default Octave repository and add the files. I have chosen the location for the functions looking at the last year repository Kai set.

Location:
libinterp/dldfcn: ilutp.cc ilu0.cc iluc.cc ichol0.cc icholt.cc
scripts/sparse: ilu.m ichol.m

That is just a sugestion and should be revised and accepted by the maintainers.

Future contributions


There is a week left that I want to use it to start (and hopefully finish) the development of sprandsym extra parameters that Matlab have but Octave does not. As I submitted in the past a changeset for a similar functionality in sprand and sprandn, it will be much easier to implement for me.

Also I am interested in developing some sparse linear solvers like minres and lsqr that Octave lacks. They are tightly related to the preconditioners I have been working on, and would be nice if they could be assigned to me for developing them.


Regards,

Eduardo

No comments:

Post a Comment