(250000 / 200000000) * 100[1] 0.125
In this hands-on session we will utilize AlphaFold to predict protein structure from sequence (Jumper et al. 2021).
Without the aid of such approaches, it can take years of expensive laboratory work to determine the structure of just one protein. With AlphaFold we can now accurately compute a typical protein structure in as little as ten minutes.
The PDB database (the main repository of experimental structures) only has ~250 thousand structures (we saw this in the last lab). The main protein sequence database has over 200 million sequences. Only 0.125% of known sequences have a known structure - this is called the “structure knowledge gap”.
(250000 / 200000000) * 100[1] 0.125
The EBI has a database of pre-computed AlphaFold (AF) models called AFDB. This is growing all the time and can be useful to check before running AF ourselves.
We can download and run locally (on our own computers) but we need a GPU. Or we can use “cloud” computing to run this on someone elses computer
We will use ColabFold < https://github.com/sokrypton/ColabFold >
We previously found there was no AFDB entry for our HIV sequence:
>HIV-Pr-Dimer
PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYD
QILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF:PQITLWQRPLVTIKIGGQLK
EALLDTGADDTVLEEMSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPT
PVNIIGRNLLTQIGCTLNF
Here we will use AlphaFold2_mmseqs2
