Waiting for PBC


Recently, we published a paper called “Gaussian-based coupled-cluster theory for the ground state and band structure of solids” (J. Chem. Theory Comput., 13, 1209, (2017)). There we describe how one can implement and apply the same systematically improvable framework of electron correlation for materials spectra, that we have long enjoyed (and taken for granted) in molecular problems. However, what might not be apparent from the publication is that it represents the culmination of a long-standing multi-year dream to create our own quantum chemistry software that works with periodic boundary conditions (PBC). In this blog post, I’ll write a little bit about the background behind this work and the many year effort that was required to get here.

The first time I felt that we needed PBC quantum chemistry software was when I was working together with Dominika Zgid on the implementation of dynamical mean-field theory for the ab initio treatment of solids, sometime around 2010. Back then, we had a copy of the CRYSTAL program that implements a Gaussian-based treatment of periodic systems. However, as the program was only provided as a binary, the only way to extract intermediate quantities was either through the output file, or through an additional cryptic text file that contained some information on quantities such as the orbitals. This was not really enough for us to build on, and in fact it is one of the reasons why we do not report DMFT energies in the paper (J. Chem. Phys. 134, 094115 (2011) as we had no access to the periodic integrals!

For those of us who develop molecular quantum chemistry methods, we take for granted that we will have easy access to the quantities we need to model electron correlation. For example, I have always assumed that we can extract atomic and molecular integrals and molecular orbitals either from some open-source library or a quantum chemistry program of our choice. However, this not something to take for granted in the PBC setting, and it’s not only a problem of software and source code. In fact, there are many open-source PBC programs (and closed-source PBC programs provided with source code) but they usually do not provide the quantities one needs for electron correlation methods. For example, most plane-wave codes do not have the capability to compute the molecular orbital integrals, as they are not needed at the DFT level.

Based on our initial experience, I decided that the group should try its hand at a PBC quantum chemistry implementation, and one that, to boot, would support the basis functions that we know and love from molecular quantum chemistry, namely Gaussians. When I moved to Princeton in 2012, one of the first events I held at the new house (still completely devoid of furniture) was a hackathon involving the whole group, to try build a periodic quantum chemistry code in two days. Trying to write a single new code, starting with a group of 12 people who know nothing about what they should be implementing, is not actually a good idea if you expect a real product! But in hindsight, it was a useful exercise. The great thing about an event like a hackathon, is that it forces you to overcome your fears about learning and trying something new, simply because of the peer pressure. If you come from a molecular theory setting, there are plenty of things to learn in order to understand periodic implementations, for example, how to deal with divergent Coulomb contributions, as well as reciprocal space and Brillouin zone sampling. For me personally, while I certainly didn’t learn and understand everything I needed from that single event, at least I learnt what it was I had to learn, and that in itself was productive.

After 2012, our dream of a periodic code languished for a while. This was in part because we were very busy with other projects, but also because the idea, as so often is the case, needed some time to mature. This did not mean, however, we had given up. In 2014 I had the chance to go to a meeting in Vienna (German theoretical chemistry conference). The best thing about going to science meetings is that you (hopefully) get to see all your friends. In this case, I had the chance to meet again with George Booth, who, when a postdoc with me, had organized the whole PBC hackathon. We had been corresponding for a few months about the possibility of reviving the PBC project, and the Vienna meeting was a great place to start, because we could also meet with Andreas Grueneis, George’s longstanding collaborator, and the author of the pioneering correlated quantum chemistry capabilities in VASP. I had been pushing the idea that we should implement a Gaussian based periodic code *inside* VASP, where the Gaussians could be internally represented as projector augmented plane waves (PAW). This might seem like just adding a layer of complication, but the reason for supporting a Gaussian representation is that Gaussians are remarkably compact, if all one wants is modest accuracy. Thus one could expect correlated calculations (at modest accuracy) with Gaussian basis sets to be much more affordable than ones working in the PAW basis directly. The presence of the “augmented” part of the PAW basis made the implementation in VASP a little complicated, but we managed to resolve these at the Vienna meeting with some helpful discussions with Georg Kresse (the original author of VASP). After this point, I returned to Princeton, and everything further was done by George and Andreas and his student Theodoros, and resulted in the nice comparison between the power of Gaussian bases and PAW representations in correlated PBC MP2 calculations, described in J. Chem. Phys. 145, 084111 (2016).

Nonetheless, the complexities working with the VASP codebase was for my group – non-expert in the subtleties of VASP – still unsatisfactory. As the PySCF project, started by Qiming Sun, began to mature, and the ease of programming with it became apparent, it became clear that we should really just implement periodic code functionality within PySCF itself. And so it was that one weekend in the late summer of 2015, Tim Berkelbach and I sat down to program a Gaussian based periodic code in PySCF – as is apparent from the above, really the third attempt to do so in our group! I sat down with the book by Dominik Marx and Juerg Hutter (which Tim had a copy of) which has a nice description of the organization of a periodic code. I started programming based off Chapter 3 (some of the variable names in the PySCF PBC implementation such as “ngs” come from this book), while Tim sat to work out all the pesky details of implementing pseudopotentials along with working out some of the finer technicalities with James. After 4 weekends, much of which was spent with Tim valiantly correcting errors in the literature in the reported pseudopotentials, we had a working prototype. It could compute, very slowly, the DFT energy of the helium solid. That’s not much, but that’s how things start!

We were then joined in our efforts by James McClain, and of course Qiming, to whom you can give any function in a program and have it transformed into something that runs 10 times as fast. James, Qiming, and Tim worked throughout the following year to help lay down more of the theoretical groundwork needed for non-trivial systems and to further develop a production level code with correlated quantum chemistry functionality – all the way from efficient integrals and Hartree-Fock to deriving and implementing parallelized k-point sampled equation of motion coupled cluster! Once that framework was laid down, Qiming was then able to go in and completely revamp the code: removing many of the bottlenecks through more efficient coding and development of novel algorithms. Arguably everything was almost completed close to 9 months after the initial prototype, but as is always the case, crossing the t’s and dotting the i’s in the last stretch takes a very long time. I personally think it’s important, when developing methods, not to just carry out a toy calculation, but to show that you can carry out calculations of meaningful quality – otherwise no one will appreciate the method. James, without complaining, and with great efficiency, parallelized and optimized the EOM-CCSD code and patiently carried out the calculations on our modest computational resources. (We only had 3 Tb of diskspace which he had to carefully ration!) This represented the first time EOM-CCSD was implemented in three-dimensional systems. Of course, in the summer, we also had to say goodbye to Tim, and moving to Caltech further delayed the work. But in the end, I think we managed to carry out a calculation we could all be happy with – an EOM-CCSD calculation for silicon bandstructure, with a modest, but not laughable basis (TZVP) and sufficient k-point sampling (4x4x4) — all in all, more than 2000 correlated orbitals — from which we could roughly estimate the remaining basis set error, and extrapolate to the thermodynamic limit of the bandgap.

And so, that is the multi-year story behind this single paper in JCTC! Of course, this represents not the end of the story – it is just a chapter in a much longer effort to transform materials simulations through high accuracy quantum chemistry. As you may have gathered, in our group, we take a long term perspective on things, and probably a similar kind of story could be told for most of the papers that we write. Those stories will have to wait for another post. But in the meantime, all our code is available through our PySCF repository, so check it out yourself, and have fun doing periodic quantum chemistry!