On Tue, Oct 23, 2018 at 9:43 AM August Edwards Guldberg Mikkelsen
<[log in to unmask]> wrote:
> Dear AMP-developers,
> I have recently started using the kernel ridge regression module in AMP, which in my case appears to perform better and faster than the neural network module. However, I am currently stuck with some errors that I am hoping you can help me understand.
Just to add a comment about this. The computational complexity for
kernel ridge regression training with Cholesky factorization is 1/3
O(n^3), n being the number of data points. Building the kernel matrix
is O(n^2) (if I recall correctly). For an ANN according to , the
forward propagation is O(n^4) and backward propagation is O(n^5). Note
that the topology of the ANN, meaning the number of hidden-layers and
nodes, will change that complexity, too. It would be nice if someone
else could confirm the computational complexity of ANNs.
it is known that KRR is computationally intensive when working with
large datasets. I haven't tested how "large" of a dataset can be done.
This might be the best opportunity for doing it.
> The tests I have done so far have involved fitting forces and energies to data sets of 10, 100 and 1000 images (each images contains ~150 atoms) using the default settings of the KRR module. Using 8 cores on our cluster, I found that the data sets of 10 and 100 images trained reasonably fast, while the training of the 1000 images did not converge within my 24 hour wall-time.
I mainly have been working with smaller systems but after this email,
I am training a KRR with a dataset having a total number of 1500
images composed by Cu bulk (108 atoms) to test this out (162K
fingerprints). I also am using 8 cores and 64GB of RAM. I'll write
back when the fingerprinting for forces is done and the factorization
> An obvious solution to the wall-time issue is of course to use more cores for the calculation. However when doing so, I find that my script runs for some hours before getting deleted by the superuser due to inefficient CPU usage. From what I can see in the log-file it seems to be when the fingerprinting is finished and the Cholesky decomposition starts, that the script crashes, so I guess this step parallellizes badly (if at all). Can anything be done to fix this?
The Cholesky factorization is done using scipy. And that means this is
a single core operation :(. This is a part of the KRR implementation
that has to be improved. I will try using the tf.linalg module from
TensorFlow and see if the efficiency improves. Another option would be
to use LAPACK or any other library.
> In any case, I figured that maybe my system is too large for the Cholesky approach to be efficient, so what I then tried was to set cholesky=False (see attached cholesky_false.py) so that instead the loss function is minimized. That however returned an error as you can see in the attached cholesky_false.out. Is this a known bug? The error seems to something with the loss function derivative.
The error, in this case, is due to the LossFunction class. The KRR
module has its own loss function. That should be fixed by importing
the right class:
from amp.model.kernelridge import KernelRidge, LossFunction
This is yet to be parallelized though... I will put hands on for doing that.
> Finally, I want to ask the following: Is there is a simple way to obtain the training error when you have trained a calculator object with KRR in Amp? Currently what I do is to loop over my list of training images, attaching a calculator to each image and evaluating the error for each image. This is tedious and inefficient, but I could not find any in-built method to get it.
When using the L2 loss function, the error is printed in the log file.
For the case cholesky=True, I can add a new method in which you can
request the training error to be computed and printed after training
in the log file. Would that work?
Muammar W El Khatib Rodriguez
Postdoctoral Research Associate
Brown University School of Engineering
184 Hope Street
Providence, RI, 02912, USA