Monday, April 6, 2009

EX1

Hi class-

It was just pointed out to me that some of the protein structure files I posted have some complicating factors in them that can result in multiple alpha carbon coordinates for the same amino acid. I have now preprocessed and cleaned up these files. You should download the new proteins.zip or proteins.tar.gz files from the website.

Also, here are some more programming hints:

When you are parsing a pdb file, be sure to take the exact columns that the pdb format specifies. The different fields are not necessarily separated by spaces and so splitting a string apart can fail. You can access specific subsections of a string using slice notation. For example, if line is the variable containing the contents of a line read in from a file, you can get columns 18-20 (inclusive) using the notation line[17:20].

Also, you can use Python's list constructors to your advantage. For your function that returns the IsPhobic array, you can easily iterate over the residues in a form like:

[Res in Phobics for Res in ResNames]

where Phobics is a list containing hydrophobic amino acid names. The "in" statement here will result in either a True or False value so that the constructed array will contain only Booleans.

Cheers,
MSS

No comments: