What does Xheavy DO any way? # XtalView/Xheavy methods

## Refinement

#### Correlation Coefficient

Refinement of the heavy atom positions is done in xheavy by a systematic correlation search. The main advantage of this method is that it is robust and accurate. The biggest drawback is that it is very slow. The equation used in the correlation search is

Sum(w*fo*fc)/Sqrt(Sum(w*fo*w*fo)*Sum(fc*fc)) {w=1 for centrics, 0.707 for acentrics, fo = |FP-FPH|, fc = fhcalc}

The advantage of this correlation is that if you include a scale factor it will cancel out. (Try it -substitute fo with s*fo and then since s is a constant (note that w and fo are not), you can move s outside the sum sign so that you have s/sqrt(s*s) which is 1). An R-factor would require a scale factor to be computed. Much of the time in refinement we don't know the scale factor yet - especially when the solution is partial. Also a least squares method would require that the scale be at least close or the matrix will be unsolvable. If phases are available from an earlier phase calculation then a phased correlation is used. In this case fc becomes |FPHcalc-FP| and w is 1 for both centric and acentric. Note: if you want to do an unphased correlation after a phase calculation set the method to "Clear phases" and push "Apply".. The correlation will go from 0.5 (random) to 1.0 (perfect) and takes some getting used to as it appears flat compared to other measures such as R or the correlation used in X-PLOR. A typical heavy atom will start out at 0.66 which is qute poor and refine to prehaps 0.70 which indicates a partial solution. Good solutions are in the range 0.75-0.80 and the best ever seen is about 0.82. However these are only rules of thumb - make sure that the correlation increases with refienement and use the figure-of-merit and phasing power to track how good the derivative is.

#### Search Grid

The search grid starts coarse and becomes progressively finer. The first grid is rmin/2 where rmin is the minimum (number-wise) resolution for the refinement. Thus to get a large radius of convergence the refinement should first be done at low resolution (e.g 5A). The program searches +/- 1 grid unit in each cell direction for the highest correlation until the atom no longer moves. In this manner it can follow a gradient. When nothing moves (i.e. the best correlation is at the center of the search box) then the grid is reduced to rmin/4. Then a grid of rmin/3 is tried followed by rmin/6, rmin/8, rmin/16 and finally rmin/24. At each grid if the atom moves it is researched until there is no further movement. If there is more than 1 site each site is moved in turn for each grid and the grid is not changed until all atoms stop moving. Finally the whole search is repeated again to check that there is no improvement. This algorithm was arrived at after a large number of trials on several derivatives in different space groups. There is a maximum number of cycles allowed to prevent infinite searches. After the position no longer changes then the next atom in the list is moved. Finally after all the atoms are moved then the occupancy is varied skipping the first atom whose occupancy is held fixed. This is done because of the independence of the correlation from scale. The scale factor is calculated later for the phase calculation. To get the absolute scale you multiply the occupancy by the scale determined later. The accuracy of this calculation will depend upon the accuracy of your phases and solution.

The structure factor calculation is done with an old-fashioned summation loop using structure factor look-up tables to speed the calculations. As it turns out for solutions less than about 15 atoms this is faster than an FFT. For more on scattering factors in XtalView click here.

## Phase Calculations

Unlike the refinement routines the phase calculations are standard MIR methods derived from the work of Blow, Crick, Rossman, Hendrickson, Lattman, Wang, Furey and others. The basic structure of the calculations is borrowed from Bill Furey's phasit program from the Univeristy of Pittsburgh. The phases are calculated in two passes. The first pass does a rough job of scaling each derivative and calcualtes the SIR phase distributions of each derivative. These are then combined to produce a first set of protein phases. In the second pass these rough phases are then used to compute better errors and scale factors for each derivative and the phase distributions are recalculated and combined for the final phases. The phases are then output as a phase with an associated figure-of-merit and, optionally, the phase distribution is saved as Hendrickson-Lattman coefficients, Amir, Bmir, Cmir, Dmir.
##### Scaling
The first problem encountered is how to scale FPH to FP. For centrics this is straightforward - except for rare crossover cases which can be ignored for the most part, the scale can be found by setting sum of |FPH-FP| to the sum of fh. For acentrics the sum of |FPH-FP| is smaller than that of fh. In fact, as Crick showed in 1960, it will on average be smaller by the square root of 2 (1.414) due to the phase angle between fh and FP. Thus before phases are available the scale factor is set by the weighted sum of the centrics and the acentrics times 1.414. After phases are available scaling is straightforward as the value of FPHcalc-FP can be directly computed.
##### Error Estimation
After scaling another problematical area is the estimate of E, the error in fh due to all causes. If this value is underestimated then the phases will be given too much weight as the ditributions will be too small and if E is too large the phasing power will be underestimated. Rather than compute a single E the E is computed as a polynomial fit to the differences in |FP-FPH|-fh as a function of the size of FP. First the reflections are sorted on FP and then they are binned into about 50 groups and a fit made of a polynomial line. E is then estimated individually when needed by plugging into this polynomial the value for FP. The exact equation for E is given as equation 3.17 in my book Practical Protein Crystallography.
##### Phasing equations
Once an estimate of E is available and fh is calculated and scaled then the coeffiients Amir, Bmir, Cmir, Dmir are calculated from the equations given in Hedrickson and Lattman, 1970 for each reflection. All of the common reflections are then added together by summing the A's together, the B's together, and so forth, and then the figure of merit and the phase are calculated as given in the same paper (or see Eqs. 3.18, 3.19, 3.20 in Practical Protein Crystallography). The phases are saved either in the form h,k,l,FP,f.o.m.,phi or h,k,l,FP,f.o.m.,phi,A,B,C,D depending on the method chosen. To view the map use xfft or xfit to calculate a map with the coefficients Fo*Fc which in this case will be FP*(figure-of-merit), phicalc.

## Strategies

Use xpatpred to start the first solution file. In xpatpred you can enter in heavy atom positions and then compare these with the patterson map. Save the predictions from xpatpred in a file and then load this file into xcontur as a label file while displaying the map. Labels will come up displaying the heavy atom vector positions in the volume 0-1, 0-1, 0-1. Open the View window in xcontur and turn on the "jacks" option for more precise positioning of the atoms.

Once you have a solution that explains the peaks in the patterson map then save the solution file in xpatpred and read it into xheavy. Click on the derivative in the derivative list and then select Edit. You will need to enter the fin file that contains the heavy atom data merged with the native data. The native data should be in column 1 and the derivative in column 2 in this fin file. If you are going to use the anomalous scattering of the derivative you should use .df file. Then enter the type of phasing you want to do - usually isomorphous. For the first run select a low resolution such as 5 Ĺngstroms. Also its a good idea to set the outlier filter to 100-130 for weak to strong derivatives respectively. Push apply to set this data and then go back to the main window. Select the refine all derivatives method and then apply. Refinement will begin.

After you have refined then calculate some phases to check the phasing stats. (You will need to enter a phase file name at this point or the program will complain.) After you calculate some phases redo the refinement. If phases are available the refinement is sone in a phased manner. However if the phases are poor don't do phased refinements! Good phases will have a phasing power of 1.5 or better and a figure of merit above 0.5 for a single derivative. Don't forget to save your solution!

If you have a derivative that you haven't solved you can try cross phasing it with one or more solved derivatives. Take the phases from the solved derivative and combine them with the unsolved derivative merged with the native data in a fin file. Check the swap f1 and f2 flag so that when you make an Fo-Fc map with xfft the coefficients will be FPH-FP, phimir. Positions in this map can be found with xcontur. the quickest way to find the unique peaks is to contour the entire assymetric unit by setting the bounds and slab. then pick the peaks with the mouse. The interpolated peak position will be printed out in the message window. Enter these into xpatpred and check them against the patterson map. You can add this solution to the one you already by selecting the append option on the load menu button in xheavy.

Once you have several derivatives solved and refined you are ready to phase. Increase the resolution on all the derivatives to 3 or 2.7, or whatever you have and then refine them all. At this point you can do a phase calculation and look at the stats for each derivative. Look for derivatives where the phasing power falls below 1 or so and restrict their resolution to this point. You can then rephase. If your combined figure of merit is about 0.75 or better you are probably home free at this point. Don't forget to save your solution! If you are using anomalous data calculate the map in both hands. You can do this most easily by inverting all of the heavy atom x,y,z's through the origin (e.g. .1, .2, -.3 becomes -.1, -.2, .3).

Take the phases you have saved and make a map with xfft with Fo*Fc (i.e. FP*f.o.m,phimir) option and load this into xcontur. Look for the solvent channels and see if you can pick out a single molecule. If so you can take this map onto xfit. Use the solvent bounds as input in mkskel and make some ridgelines to guide you examination of the map. Load the phases into xfit and make a Fo*Fc map and load the ridgelines. Look for a helix. If you can find one check if it is right-handed. If it is then you are golden if not then you need to invert the heavy atom positions as described in the previous paragraph and rephase.

Good Luck!