Lysozyme tutorial, constant pH-enabled

This tutorial adapts the widely-recognized Lysozyme in Water GROMACS tutorial by J. Lemkul, extending it to incorporate constant pH simulations.

Conventions #

In the following section, commands to be entered into your shell are identified using this color scheme:

ls -al

The output for such commands is shown like this:

readme.txt

Files you are editing are presented in the following format, with highlighted sections indicating important parameters:

; Enable Lambda Dynamics for constant pH
lambda-updater-enable = yes
constantph-enable         = yes

Prerequisite #

We assume you are familiar with using GROMACS for non-constant pH applications; proficiency with commands like grompp, genion, and mdrun, as well as understanding MDP files, is sufficient. If you need a refresher, consult the Lemkul Lab tutorial site or the GROMACS tutorial page. For questions about (non-FMM, non-constant pH) standard GROMACS settings, refer to the official GROMACS documentation.

We also assume you have successfully installed our implementation of constant pH GROMACS. If not, see the installation guide.

Verify that you have the correct gmx binary in your path by running this command:

gmx pdb2gmx -h | grep ldhis

which should return something like this:

       :-) GROMACS - gmx pdb2gmx, 2024-dev-20240108-249226c0e2-local (-:

Executable:   /home/ebriand/Projects/gromacs_juelich/cmake-build-relwithdebinfo_localsettings/bin/gmx
Data prefix:  /home/ebriand/Projects/gromacs_juelich (source tree)
Working dir:  /home/ebriand/Projects/__CPH_Runs/SimpleLambdadyn/structure_qihi/single_glu_qi
Command line:
  gmx pdb2gmx -h

            [-[no]ldglu] [-[no]ldhis] [-[no]ldasp] [-[no]ldlys] [-[no]ldtyr]

GROMACS reminds you: "Life in the streets is not easy" (Marky Mark)

 -[no]ldhis                 (no)

The -ldhis option being specific to the constant pH build, its presence attest the correct gmx binary is executed.

Protein structure #

We will use the following lysozyme PDB file as input: 4lzt_clean.pdb (PDB: 4LZT).

Although you are free to use any protein PDB, we recommend a simple, globular protein cleaned-up of ions or prosthetic groups. Generally, any protein that works with pdb2gmx for a non-constant pH run will also work for this tutorial.

Lysozyme protein in the cartoon style, with the protonatable residue in atom-shown style

Force Field Selection #

Download a force field with constant pH-specific modifications from our force fields page. We recommend the CHARMM36m-based constant pH force field as an good general-purpose option for protein simulations.

Ensure the force field directory (ending with .ff, e.g., charmm36m-lambdadyn.ff) is present in your working directory for this tutorial.

MDP files #

We provide the following mdp Files for this tutorial. Download them into the directory containing your PDB structure.

Single file archive (TGZ)
or individual files:

Errors, Mistakes and Undoing Changes #

Our constant pH simulation employs the typical GROMACS workflow (pdb2gmx, solvate, genion, etc.). Unfortunately, these tools often modify the topology file topol.top non-idempotently, meaning that running the same command multiple times may yield different, potentially incorrect states. This can occur even when errors arise through no fault of the user.

Consequently, mistakes or errors may necessitate restarting the tutorial from the beginning. A preferable alternative is to execute each step in a separate directory, copying it for each subsequent step, as follows:

mkdir topology_creation; cd topology_creation
gmx pdb2gmx ...
cd ..
cp -r topology_creation solvation
cd solvation
gmx solvate ...
...

In that case, undoing the current step can be as simple as deleting the directory, and copying it back:

# Oops, incorrect solvation command!!!
cd ..
rm -rf solvation
cp -r topology_creation solvation
cd solvation
gmx solvate ...  # Now thats fixed!

pdb2gmx #

Assuming the starting structure is 4lzt_clean.pdb:

gmx pdb2gmx -o prot.gro -ignh  -ff charmm36m-lambdadyn \
    -water tip3p  -ldglu -ldhis -ldasp  -f  4lzt_clean.pdb

This creates the topology. The arguments -ldglu -ldhis -ldasp enables dynamic protonation for Glu, Asp and His. To select a subset of these residues, add -ldinteractive and answer the successive prompts. Or remove one of the three switches to disable this residue for protonation.

After running the command, you’ll see the following output message. Pay attention to the highlighted line:

               --------- PLEASE NOTE ------------  
  
You have successfully generated a topology from: prot.pdb.  
  
The Charmm36m-lambdadyn force field and the tip3p water model are used.  
  
Lambda dynamics for constant pH was selected. (10 sites)  
  
               --------- ETON ESAELP ------------

This output confirms the generation of a constant pH topology and indicates the number of protonable sites (10 in this case).

For those interested in the technical details, you can examine the topol.top file to see the modifications made to the topology format. For a comprehensive explanation of the constant pH topology format for sites and forms, refer to the CPH topology page.

Setting the Simulation Box Size #

Use the editconf command to define the simulation box. Make the box cubic (this is mandatory for the current implementation of FMM). The box should be slightly larger than usual, to accomodate the buffer sites (see below), which need to be at a distance of 3 nm from their titratable residues as well as separated from other buffer sites, such that small box sizes might not have space enough to accomodate all of them.

For example, if you typically use -d 1.0, set it to -d 1.5, or specify an absolute size, such as the following which is confirmed to work for for lysozyme:

gmx editconf -f prot.gro -o box.gro -box 8 8 8 -c -bt cubic

Additionally, note that the force field files downloaded from our force fields page have a minimum box size specified (typically, but not always, 6x6x6 nm). Such specifications are necessary due to finite box size effects, which we discuss in more details here. Violating them can lead to marked $p K_{a}$ shifts.

Solvation #

Use solvate as usual. There is nothing CPH-related here.

gmx solvate -cs spc216.gro -cp box.gro -p topol.top -o solv.gro

Ion generation #

This steps places the normal Na Cl ions as usual, but also places the buffer sites (variable counterion) for constant pH. Those are water molecules that will take the opposite charge of the protonable sites, so the overall box stays neutral.

The main challenge is that those buffer sites needs to be both:

at a given distance to their associated protonated residue (3 nm typically) and
at a minimal distance from the protein generally, and from the other buffer sites (2nm typically).

Illustration of the buffer site and required distance between them and the titratable residue

The genion tool has been modified to make this assignement while respecting all distance constraints, or error out if they cannot be satisfied together.

First generate a tpr file for genion, based on the genion.mdp setting file:

gmx grompp -f genion.mdp -c solv.gro  -o genion.tpr -p topol.top -maxwarn 1

The genion.mdp files contains the following constant pH specific lines, to enable genion to know about which residue are protonatable and correctly place the buffer sites:

constantph-enable         = yes
constantph-mode           = genion

We then generate the ions with genion, with most the settings typical, except for the last line:

gmx genion -s genion.tpr -o solvions.gro \
           -p topol.top -pname SOD -nname CLA \
           -conc 0.150 -neutral \
           -cphcoupled -rminprotld 2.0 -rminld 3.0 -rtolld 0.3

Select the solvent group to proceed (tips: echo SOL | gmx genion ... for non-interactive runs).

Let’s go through the constant pH specific options:

-cphcoupled : Add one buffer for each residue (mandatory)
-rminprotld 2.0 : place the buffer at a minimum of 2 nm from the protein (and other buffers, and everything non-solvent and non-ion in general). A distance of at least 2.0 is recommended. Longer distance is acceptable, but necessitates a large box for no clear benefit.
-rminld 3.0: maintain a distance of 3 nm between the buffer and its residue. As the calibration shipped with the forcefield relies on this distance, it cannot be changed - higher is not better.
-rtolld 0.3 : accept a 0.3 nm tolerance for all of the previously-specified distance constraint. This is a typical value.

For the lysozyme system, the previous command will work without problem. However, for other proteins, you might find that the box size is too low to allow for the satisfaction of all distance constraints. In that case, restart from pdb2gmx, and define a larger cubic box - possibly iteratively. Be mindful of impossible-to-fullfill constraints: for instance, -rminprotld 4.0 with -rminld 3.0 cannot be fullfilled as water molecules at a distance of 3 nm exactly have been eliminated by the first argument as too near.

Another case is titratable residues deep into the protein, such that there are no water molecules close enough (with or without taking into account -rminprotld). In that situation, you will need a force field with calibration data for a large rminld distance. See our forcefields page for calibrations files, or make your longer distance calibration yourself (see the CPH calibration documentation). See also our discussion of the effect of buffer distance.

The buffer molecule are water molecule that are changed into CION molecule (see end of topology), which are just water with an optional +1 charge on the oxygen.

Energy minimization #

There is nothing CPH specific for these steps

gmx grompp -f em.mdp -c solvions.gro  -o em.tpr -p topol.top

gmx mdrun -deffnm em -v

Lambda equilibration #

Each protonable site is created with a default protonation state, without regard to its environement, when the topology is generated. This is problematic, since it could be discordant with the local charge environement, and lead to simulation instability (e.g. charged residue in the protein core). To solve this, we run a lambda equilibration step, in which all non-solvent non-ions atoms are restrained, but the protonation state is free to change. After a few picoseconds, the protonation state will adopt a sensible configuration, from which we will start actual simulation. This is also the phase where atom velocities are generated

gmx grompp -f equil_lambda.mdp -c em.gro -r em.gro \
            -o equil_lambda.tpr -p topol.top

Let’s look at the constant pH specific part of the MDP file before starting the simulation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; FMM Electrostatics
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

coulombtype                   = FMM       
fmm-override-multipole-order  = 8
fmm-override-tree-depth       = 2    ; Works for 8x8x8 nm box

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Lambda dynamics / constant pH
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

; Enable the constant pH code
lambda-updater-enable     = yes
constantph-enable         = yes
constantph-mode           = dynamics

; Set the desired pH
constantph-ph-mode        = fixed
constantph-fixed-ph       = 7.4

; Output frequency
constantph-nstout         = 1000

; Low transition barrier when equilibrating lambda
constantph-bias-initial-barrier = 2.0

We first enable the FMM electrostatic solver: this is mandatory, as our constant pH code uses FMM-specific feature to be able to run at a reasonable speed. We attract your attention to the fmm-override-tree-depth parameter, which must be set to an integer, and whose optimal setting for performance depends on the size of the system (as well as your hardware, to some extend). We refer you to the FMM MDP settings page for more details.

We then turn on constant pH, set the pH (here, 7.4 for physiological conditions), as well as the output frequency for the protonation state. Finally, the highlighted line 26 is an important setting for equilibration: it sets the height of the barrier between the protonated and deprotonated state in our two-well potential to a very low value of 2 kJ/mol. This will allow free transition, to permit the system to quickly settle on an good starting protonation state. Visually, the barrier height is the red line in the following representation of the double well potential (here, with a rather high barrier):

Plot of the double well potential with the barrier height highlighted

We can then move to the simulation itself:

gmx mdrun -deffnm equil_lambda -lambdaout equil_lambda.dat -v

The output file equil_lambda.dat contains the protonation trajectories. It is a text file with comments (starting with ;) and data lines which contain the lambda coordinate value for each protonable sites (corresponding to protonation state), along side other quantities concerning the proton dynamics.

Here are the first few lines of this file, filled with comments recapitulating the settings used as well as internal diagnostics informations:

; Propagator: ProtTautPropagator
; num residue: 10 | num buffer 10
; dt_ 0.00200   | mass 60.00 
; target temp 300.0
; Using unified output lambda as protonation state: 0
; Charge constraint: SHAKE
; Interpolation Force Manager: ProtTaut Dynamics interpolation manager 
; Interpolation Force Manager: Coupled residue-buffer mode = 1 
; Buffer Manager:  10 buffers total,  10 enabled,   0 directly controlled, 0 disabled
; Collective lambda mass   10.000 max charge    13.0000 min charge    -3.0000
; Coupled buffer formulation: 1
; Collective buffer manager: ProtTaut buffer collective buffer manager 
; Bias: prot-taut double well - PFC? 1
; fixed pH:         7.40

The last lines of this file are recapitalating the last state of each of the 10 titratable residue of the protein:

; Writing state to checkpoint
; Final internal state is: 
;     residue [   0]  lambda_p =         0.0066841524   velocity =         -0.2951664031
;                     lambda_t =         0.4506141543   velocity =          0.2465089858
;     residue [   1]  lambda_p =         1.0142482519   velocity =          0.2079668343
;                     lambda_t =         0.4213783443   velocity =         -0.0528244600
;     residue [   2]  lambda_p =         0.0011970188   velocity =          0.0230339095
;                     lambda_t =         0.2282474488   velocity =          0.2272512019
;     residue [   3]  lambda_p =         1.0119093657   velocity =         -0.1087641343
;                     lambda_t =        -0.0221623722   velocity =          0.2093082517
;     residue [   4]  lambda_p =        -0.0319707729   velocity =         -0.1973873526
;                     lambda_t =        -0.0541575402   velocity =          0.0518201627
;     residue [   5]  lambda_p =         0.0274856947   velocity =          0.1474107057
;                     lambda_t =         0.6484327912   velocity =          0.0308893304
;     residue [   6]  lambda_p =        -0.0059586708   velocity =         -0.0520664826
;                     lambda_t =         0.3720209897   velocity =         -0.1076164022
;     residue [   7]  lambda_p =        -0.0092997942   velocity =         -0.0363448523
;                     lambda_t =         0.0587233007   velocity =         -0.1340652406
;     residue [   8]  lambda_p =         1.1058342457   velocity =         -0.1004268229
;                     lambda_t =         0.2853246629   velocity =         -0.1654260606
;     residue [   9]  lambda_p =        -0.0331853665   velocity =          0.2384097874
;                     lambda_t =         0.1017527282   velocity =         -0.0818591341
; Writing buffer checkpoint:
;      collective buffer lambda         0.0000000000 (v:         0.0000000000  F_(t-1)         0.0000000000)
; closing output file...

More documentation on the output file format can be found here.

Troubleshooting lambda equilibration #

In certain rare situations, where the solvent orientation is particularly bad, the solvent should also be restrained: first do a lambda equilibration with all heavy atoms restrained (including the solvent), then do one with the solvent free. It might also be helpful to modify the initial state of the residues (see init_lambda_tl in the topology) in pathological cases.
The restraint strength should be sufficient to keep the system in a sensible configuration. If the simulation crashes rapidly in this equilibration run, try with stronger restraint and/or minimize more thouroughly.

If no instability manifest itself, either through crash or LINCS error, continue.

Normal equilibration #

Once a sensible protonation state is present, you can equilibrate your system in the way you usually do, except that CPH settings will be present in the MDP file, and that this run is, in fact, using constant pH. There is nothing constant pH-specific in this step.

For the lysozyme system, we suggest using equil_structure.mdp:

gmx grompp -f equil_structure.mdp -c equil_lambda.gro -r equil_lambda.gro \
            -o equil_structure.tpr -p topol.top -t equil_lambda.cpt

Note how we pass the lambda_equil.cpt checkpoint file: this is necessary not only for a continuation of the main MD dynamics, but also for the constant pH state to pass through to the next simulation.

The main difference from the lambda equilibration files are as follow:

nsteps                  = 250000    ; 2 * 250000 = 500 ps (0.5ns)
continuation            = yes       
gen_vel                 = no      
define                  =    ; No restraints

; Non-trivial transition barrier past the lambda equilibration
constantph-bias-initial-barrier = 6.0

Namely, a longer run, without generating velocities nor using restraints, and with a higher barrier between protonated and deprotonated state, to reduce the time spent in the intermediate unphysical states.

This is executed as usual:

gmx mdrun -deffnm equil_structure -lambdaout equil_structure.dat -v

You might want to do fmm-override-tree-depth benchmarking at this state: trying out several values to see which results in the best performance for your hardware. We refer you, once again, to the FMM MDP settings page for more details.

Production run #

The production run also proceeds as in normal MD production runs. The main CPH-related concerns are:

Saving too few frames for the structure trajectories (.xtc).
The production of protonation trajectory which are excessively large (practical problem of disk space and analysis runtime); or that are too sparse and unexploitable (problematic for scientific conclusion)
The settings of the dynamic barrier and well adjustment for optimal protonation space sampling

We suggest the production_lambda.mdp files, with the following main differences with the previous equil_structure.mdp equilibration step, adressing the above-listed concerns:

nsteps                  = 2500000    ; 2 * 2500000 = 5000 ps (5ns)
nstxout-compressed      = 5000   ; save coordinates every 10.0 ps

; Output frequency
constantph-nstout         = 250

; Well position adjustement
constantph-online-well-position-adjustment = yes

; Barrier height adjust
constantph-online-barrier-height-adjustment = yes
constantph-target-protonation-in-transition-fraction = 0.15
constantph-target-tautomer-in-transition-fraction = 0.15
constantph-in-transition-fraction-tolerance = 0.05

We adress the first concern with a dense structure trajectory, saving one frame every 10 ps (nstxout-compressed). This is not a problem since we will run this production simulation for only 5 ns. This concerns might seem unrelated to residue titration (as the xtc file contain only the protein conformation, not its protonation state): however, the conformation of the protein is necessary for some analyses, which relate conformation and protonation state. Therefore, it is necessary to keep a decent amount of snapshots to leave open all avenue of analysis. If performing longer simulation, this should be balanced with practical concerns such as disk space and post-hoc analysis time. For the 75 ns replica used in the publication, nstxout-compressed = 50000 (every 100 ps) was found effective.

We adress the second concern with constantph-nstout. The value 250 will lead to a high resolution protonation trajectory. The value of this setting will vary depending on what you intend the simulation for, and of course your hardware disk space.

Here are some rough guidelines:

A single step of protonation trajectory is smaller than the equivalent full system frame, therefore it can happen significantly more often for the same disk space expense.
You should have nstout set at an integer divider of the frequency of coordinate output (nstxout-compressed). This is helpful for analysis, as for each structure frame, there will be a corresponding record of the exact protonation (output not in sync will cause unecessary interpolation to be used). This can also be used to easily display the protonation state in visualization software.
If exclusively performing titration to know the pKa, the coordinates are less important. But foregoing coordinate output can lead to regret later, if a complex titration is found that a more sophisticated analysis might have elucidated.
Rule of thumb: constantph-nstout = 250 for extreme details. Anything lower is generally just a waste of disk space, except for very short simulations. 500-1000 is sufficient to perform all kind of analysis. If the interval between each snapshot protonation state is greater than 10-20 ps (nstout = 5000 to 10000, assuming dt of 2 fs), some analyses on the structure-protonation coupling, and coupling between protonation become difficult to perform accurately (because we are approaching the dwell time in an individual protonation state, for quickly switching residues - therefore it becomes possible to miss states). For long simulation where space is at a premium, try 25-50 protonation output for every structure frame.

We enable dynamic barrier and well adjustment, a sampling-enhancing feature of our implementation, by using the recommended settings. We direct you to our Dynamic Barrier And Well Adjustment page for details on this feature, and the effects of its various parameters.

We preprocess:

gmx grompp -f production_lambda.mdp -c equil_structure.gro -r equil_structure.gro \
            -o production.tpr -p topol.top -t equil_structure.cpt

And finally run the production simulation:

gmx mdrun -deffnm production -lambdaout production.dat -v

Voilà! You are done with your first constant pH MD!

Hopefully this simulation was more accurate due to its treatment of changing protonation state. What you might do with the resulting protein trajectory depends on the biological question you want to adress, and all the typical MD post-hoc analysis methods are suitable to use on the .xtc trajectory you obtained.

However, you will likely want to do post-processing using the protonation state - after all this is one of the main selling point of constant pH MD. We refer you to our our post-processing documentation for ideas on what to do with your production.dat protonation trajectory.

Feedback #

Your feedback on this tutorial is highly welcomed! This includes suggestions for improvement of this tutorial, as well as bug report. See the Contact/Feedback page for ways to contact us.