LISTSERV mailing list manager LISTSERV 16.5

Help for AMP-USERS Archives


AMP-USERS Archives

AMP-USERS Archives


AMP-USERS@LISTSERV.BROWN.EDU


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

AMP-USERS Home

AMP-USERS Home

AMP-USERS  October 2018

AMP-USERS October 2018

Subject:

Re: Kernel Ridge Regression in AMP

From:

"El Khatib Rodriguez, Muammar" <[log in to unmask]>

Reply-To:

Amp Users List <[log in to unmask]>

Date:

Tue, 23 Oct 2018 22:25:19 -0400

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (79 lines)

Hi August,


On Tue, Oct 23, 2018 at 9:43 AM August Edwards Guldberg Mikkelsen
<[log in to unmask]> wrote:
>
> Dear AMP-developers,
>
>
> I have recently started using the kernel ridge regression module in AMP, which in my case appears to perform better and faster than the neural network module. However, I am currently stuck with some errors that I am hoping you can help me understand.
>

Just to add a comment about this. The computational complexity for
kernel ridge regression training with Cholesky factorization is 1/3
O(n^3), n being the number of data points. Building the kernel matrix
is O(n^2) (if I recall correctly). For an ANN according to [0], the
forward propagation is O(n^4) and backward propagation is O(n^5). Note
that the topology of the ANN, meaning the number of hidden-layers and
nodes, will change that complexity, too. It would be nice if someone
else could confirm the computational complexity of ANNs.

it is known that KRR is computationally intensive when working with
large datasets. I haven't tested how "large" of a dataset can be done.
This might be the best opportunity for doing it.

>
> The tests I have done so far have involved fitting forces and energies to data sets of 10, 100 and 1000 images (each images contains ~150 atoms) using the default settings of the KRR module. Using 8 cores on our cluster, I found that the data sets of 10 and 100 images trained reasonably fast, while the training of the 1000 images did not converge within my 24 hour wall-time.
>

I mainly have been working with smaller systems but after this email,
I am training a KRR with a dataset having a total number of 1500
images composed by Cu bulk (108 atoms) to test this out (162K
fingerprints). I also am using 8 cores and 64GB of RAM. I'll write
back when the fingerprinting for forces is done and the factorization
starts/finishes/fails.

>
> An obvious solution to the wall-time issue is of course to use more cores for the calculation. However when doing so, I find that my script runs for some hours before getting deleted by the superuser due to inefficient CPU usage. From what I can see in the log-file it seems to be when the fingerprinting is finished and the Cholesky decomposition starts, that the script crashes, so I guess this step parallellizes badly (if at all). Can anything be done to fix this?
>

The Cholesky factorization is done using scipy. And that means this is
a single core operation :(. This is a part of the KRR implementation
that has to be improved. I will try using the tf.linalg module from
TensorFlow and see if the efficiency improves. Another option would be
to use LAPACK or any other library.

>
> In any case, I figured that maybe my system is too large for the Cholesky approach to be efficient, so what I then tried was to set cholesky=False (see attached cholesky_false.py) so that instead the loss function is minimized. That however returned an error as you can see in the attached cholesky_false.out. Is this a known bug? The error seems to something with the loss function derivative.
>

The error, in this case, is due to the LossFunction class. The KRR
module has its own loss function. That should be fixed by importing
the right class:

from amp.model.kernelridge import KernelRidge, LossFunction

This is yet to be parallelized though... I will put hands on for doing that.

>
> Finally, I want to ask the following: Is there is a simple way to obtain the training error when you have trained a calculator object with KRR in Amp? Currently what I do is to loop over my list of training images, attaching a calculator to each image and evaluating the error for each image. This is tedious and inefficient, but I could not find any in-built method to get it.
>

When using the L2 loss function, the error is printed in the log file.
For the case cholesky=True,  I can add a new method in which you can
request the training error to be computed and printed after training
in the log file. Would that work?

Best,

[0] https://kasperfred.com/posts/computational-complexity-of-neural-networks

--
Muammar W El Khatib Rodriguez
Postdoctoral Research Associate
Brown University School of Engineering
184 Hope Street
Providence, RI, 02912, USA
http://brown.edu/go/catalyst

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017

ATOM RSS1 RSS2



LISTSERV.BROWN.EDU

CataList Email List Search Powered by the LISTSERV Email List Manager