So, in a slight change of events, instead of the typical personal blog posts I’ve been doing lately, I feel a need to make a more technical one. Either for others’ benefit or for my own, in the event I happen to lose said notes on how to do this stuff. In any case, these are the summation of online searches, trial & error, and a bit of ingenuity in other random spots, all directed towards the generation, editing, and mapping of surface electrostatics maps on structural models of crystallized proteins. I’ll drop in cited links when I can, pending whether or not I can track some of them down again!
As I was trying to pull down a ton of structures to look at, I wanted to get into looking at the electrostatics between these molecules to see just how structurally conserved some of these molecules were. The problem with doing that when you have dozens & dozens of PDB files is that for the modeling software that I use (PyMOL on OSX 10.8.5 via MacPorts), modeling these all through APBS takes forever.
APBS — the Adaptive Poisson-Boltzmann Solver — is a structural biology software package to allow an individual to simulate & visualize the ionic potential of the surface of a protein crystal structure, assuming interactions with an ionic solvent solution (e.g. water with salt molecules at physiological concentrations). With this, you can do a rough (or refined, if you’re versed in the technical constraints of the software) prediction of the surface electrostatic charge density is of the crystal protein structure.
However, I still had to get a hold of all those pesky PDB files for each and every structure I was aiming to conceptualize. Clicking through and manually downloading each & every file would be a ridiculously cumbersome use of my time. Plus, after my previous forays in bioinformatics processing, I’m always intrigued by the challenge of developing a script to aid mass processing! I’m more adept in Perl than any other languages right now, however that was overkill for what I needed here. A simple bash script was all I needed. Oh, and the wget module that I swiped from the aforementioned MacPorts collection.
The following scripts (in
bash, so you need to be using a *nix/Linux environment) were used to do essentially all of the heavy lifting for the acquired PDB files
Rather than just listing all of the PDB identifiers in a text document for simplicity’s sake, I apparently had the notion to just create empty files in my download directory instead1. Regardless, with a parseable list (my directory listing in the example here), you can throw the PDB 4–character identifiers into
wget and let it do the brunt for you!
for ids in *
[[ -d $ids ]] && wget "http://www.rcsb.org/pdb/files/$ids.pdb"
Working through the list (in my case,
*, the directory listing),
wget will just manually pull down each & every PDB file (scraped in the variable
$ids) for you. Depending on Internet connection speeds blah blah, you’ll eventually download everything.
These aren’t enough for APBS to really make sense of, just yet. PDB2PQR is a Python script aimed to do (most) of the preconditioning modifications to the PDB files so APBS knows how to properly interpret the structural data. Again, if you’re on a version of OS X like myself, you can procure this through the MacPorts collection; else wise, you can obtain it from the link above. Time to scrape the files again (either from a text list, or better yet, directly from the directory structure as below), and batch process all these goodies.
for ids in *.pdb
pdb2pqr --ff=PARSE --apbs-input "$ids" "$(basename $ids .pdb).pqr"
Now, you’re likely to run into a few hiccups on the way. One in particular would be the following:
Warning: multiple occupancies found:blah blah listing of atomic identifiers here (pay attention to the residue numbers if you get them)
This is indicative that you’ve got multiple atom assignments for a single residue (often happens with selenomethionine–substituted derived structures, I think) which will confound the energetics calculations at that residue. Unfortunately, I haven’t run across any simple means to fix this; you’ll have to manually load the PDB into PyMOL (or your PDB viewer of choice) and correct the offending residues. Typically, this just entails mutating each “multiple occupancy” residue into a single residue and exporting the modified structure as a new (or replace over the old) PDB file (
save xxxx.pdb). Once that’s done, plug it back through
pdb2pqr2 and get the last of them homogenized before we put them all into APBS.
Processing everything through APBS (again, installed from MacPorts) is much the same as before: parse the directory structure for the appropriate files, and pass them to
apbs properly. Figuring out how to pass them to
apbs took me a little while, but it was pretty simple when I found the (limited) documentation on the command line usage. The only extra trick that this script accounts for is migrating the potential map (
pot.dx output) to a unique file before it gets overwritten by the next file in the script; it gets rewritten as the PDB identifier/filename with
.dx appended appropriately.
for ids in *.in
mv pot.dx "$(basename $ids .in).dx"
At this point, everything should go without a hitch. If not, you’ll have to chime in here and let me know otherwise, because I have yet to discover any errors in the batch APBS processing as I’ve highlighted above.
With all of those (enormous) potential maps created, now we can do something with them! I’m keeping that until the next post, however. This one has gone on long enough. I’ll discuss how to incorporate small non–peptide ligands into the energetics calculations and how to load the electrostatics potential maps onto your PDB structure of interest, dependent or independent of APBS within PyMol!
1Really, I have no idea what kind of crack I was on at the time. I manually entered
touch XXXXwith a carriage return for each 4–letter PDB identifier. Maybe I was drunk on a lack of sleep.
2It would probably be easier to invoke the script on a per file basis here, unless you have an obnoxious number of edited PDB files, upon which maybe you’re just better off re–running the script on all of them again.