Does Molecular Docking Need a Crystal Structure? (Holo, Apo, and Predicted Pockets)
If your assignment says “use a crystallized structure,” that almost never means you must grow crystals in your own lab. It means you need a defensible 3D receptor and a binding site you can justify — usually from the RCSB PDB, sometimes from AlphaFold or a homolog. This guide unpacks the six sub-questions students actually ask on forums and in office hours, with data on when apo structures fail and how to write honest methods.
Short answer (read this first)
No — you do not need your own crystal structure. For most coursework and early virtual screening you need:
- A receptor model with coordinates you can cite (PDB ID, AlphaFold accession, or homology template).
- A binding site definition (co-crystal ligand, literature residues, or a clearly labeled predicted pocket).
- A sanity check that your setup can reproduce a known pose when one exists (redock RMSD ≤ ~2 Å is the usual teaching cutoff).
What you choose — holo (ligand-bound), apo (ligand-free), or predicted — changes pose accuracy more than most students expect, because standard AutoDock Vina keeps the protein rigid.
What people really mean by “need a crystal structure”
On Reddit, Chemistry Stack Exchange, and course Slack channels, “do I need a crystal structure for docking?” collapses into several distinct questions. Answer the one your TA is actually asking:
| What they ask | What they worry about | Practical answer |
|---|---|---|
| “Must it be experimentally determined?” | Using AlphaFold feels like cheating | Many rubrics allow PDB or AF if you cite the source and state limitations; some require “X-ray only” — read the rubric. |
| “Do I need the ligand still in the PDB file?” | Downloading apo by mistake | Prefer a holo entry for the same target if available; use the co-crystal ligand to place the box, then remove it for analog docking if required. |
| “Can I use the apo structure?” | Only apo is deposited | Yes, with literature-defined or predicted pockets — but expect worse redock statistics; discuss induced fit in limitations. |
| “Is homology modeling OK?” | No PDB for their protein | Acceptable for hypothesis-building; template identity and binding-site alignment must be discussed; compare to a holo homolog when possible. |
| “My instructor said ‘use PDB’ — which file?” | Dozens of entries per target | Filter by holo, resolution, organism, ligand similarity, and missing residues in the pocket (checklist below). |
| “Redock failed — is my structure wrong?” | RMSD > 2 Å on the native ligand | Often wrong box, protonation, or apo pocket geometry — not always “wrong PDB,” but holo or refined receptor may be required. |
Holo vs apo: why it matters for rigid docking
Holo = protein structure determined with a bound ligand (co-crystal, soaked, or covalently linked). Apo = same or related protein without that ligand in the coordinate file. Binding is often accompanied by local conformational change — side-chain rotamers, loop shifts, even backbone moves. Vina does not sample those motions unless you use specialized flexible-receptor workflows.
Classic virtual-screening guidance: when holo conformers exist, prefer them; apo pockets often have side chains protruding into the cavity, which hurts docking and enrichment (Kontoyianni et al., 2008).
How much worse is apo? (published redock numbers)
Gunaydin and Atilgan systematically compared native-ligand redocking with AutoDock Vina on holo crystals, unrefined apo structures, and MD-refined apo binding sites (J. Chem. Inf. Model. 2021). Average ligand RMSD after aligning binding-site residues:
- Holo self-dock: ~1.34 Å (DUD-E), ~1.36 Å (Gunasekaran set, n=84) — your teaching-lab target zone.
- Apo without refinement: ~3.65 Å (DUD-E), ~2.90 Å (Gunasekaran) — many poses would fail a 2 Å cutoff.
- Apo after binding-site MD refinement: ~1.97 Å — approaches holo-like performance but adds work beyond a standard homework Vina run.
Takeaway for students: if a holo PDB exists for your target, use it unless the assignment forces apo analysis. If you must use apo, say explicitly that rigid docking may underestimate binding-site rearrangement and report redock results on a reference ligand when available.
Three structure scenarios (ranked for coursework)
1. Holo experimental structure (gold standard)
Use when: introductory labs, SAR series in one pocket, instructors who say “download from PDB with the inhibitor bound.”
- Center the grid on the co-crystal ligand (or a known allosteric ligand in the same file).
- Redock the native ligand before screening analogs — if top pose RMSD is consistently > 2 Å, fix box size, protonation pH, chain ID, or try another PDB conformer before docking 30 analogs.
- Keep box tight (roughly 20–25 Å per side for drug-like ligands); oversized boxes increase false positives (exhaustiveness & box-size study).
2. Apo crystal + literature-defined binding site
Use when: structural biology papers map the active site on an apo conformation (e.g. catalytic triad, cofactor groove, allosteric helix).
- Cite the residue list or figure from the primary paper — not a random PyMOL surface.
- Include a figure showing the box enclosing those residues.
- Limitations paragraph: rigid receptor; apo side chains may not represent ligand-bound rotamers.
3. Apo + predicted pocket (exploratory / hypothesis)
Use when: no holo for your exact construct, exploratory toxicology target assignment, or “predict binding site” rubric items.
- fpocket, P2Rank, or transfer box from a holo homolog (same family, similar ligand chemotype).
- Rank pockets by plausibility (conservation, druggability, literature) — not only by software score.
- Frame results as computational hypotheses, not validation of biological binding.
Which structure should I pick? (decision flowchart)
PDB selection checklist (before you dock)
Instructors lose marks for “random PDB choice.” Work through this list on the RCSB structure summary page:
| Check | Why it matters | Rule of thumb |
|---|---|---|
| Holo vs apo | Pocket shape | Same target → pick holo with relevant ligand chemotype if available |
| Resolution | Side-chain trust in the site | X-ray < 2.5 Å preferred for coursework; inspect R-free / Ramachandran |
| Organism & sequence | Biological relevance | Human vs mouse vs bacterial — match your assignment story |
| Mutations / fusion tags | Artificial pocket | Avoid engineered constructs unless that is your system |
| Missing residues in pocket | Vina cannot invent loops | Gap in binding site → pick another conformer or state limitation |
| Biological assembly | Wrong oligomer | Download biological assembly if the active site is at an interface |
| Multiple models (NMR) | Arbitrary choice | Prefer X-ray holo when available; ensemble docking is advanced |
| Several holo structures | Conformational ensemble | Cross-dock or redock native ligands; pick structure with best RMSD at default exhaustiveness (8) |
When multiple holo PDBs exist, redocking the native ligand across conformers is a standard way to pick a receptor — structures with lower self-dock RMSD often perform better in pose prediction (CrossDocker / cross-docking literature).
Worked example: HIV-1 protease (PDB 1HSG)
A concrete teaching example many labs reuse:
- Search RCSB for 1HSG — HIV-1 protease with saquinavir (holo, classic med-chem target).
- Note dimer assembly: assignments often use one chain or the biological dimer — match your rubric.
- Define the box from the co-crystal inhibitor; redock saquinavir (or extract SMILES from the ligand) before docking your analog series.
- Expect good redock RMSD on holo protease if box and protonation are correct; if RMSD is poor, check whether you stripped waters/cofactors the rubric requires.
- Methods sentence: cite PDB 1HSG, resolution, ligand used for grid centering, Vina version, exhaustiveness, pH.
This pattern generalizes: holo PDB → box from co-crystal → redock → analog screen → discuss limitations.
Redock sanity check (non-negotiable on holo structures)
Redocking means docking the known co-crystal ligand back into its pocket. Community and textbook practice treats ligand heavy-atom RMSD ≤ 2 Å vs the crystal pose (after receptor alignment) as a successful pose recovery threshold.
- Passes redock: your receptor prep, box, and protonation are plausible for analogs in the same site.
- Fails redock on holo: do not batch-screen 40 analogs yet — troubleshoot box center/size, ionization, tautomers, chain, or try another PDB entry.
- Fails redock on apo only: may be physics (pocket closed), not just “user error” — document and consider holo homolog or pocket refinement literature.
On Dock, use Review setup (0 credits) to validate chains, pH, and the 3D box before spending credits on a full run.
AlphaFold and homology models
Predicted structures are valid for many exploratory assignments, but global pLDDT is misleading. Docking success correlates more with binding-site regional accuracy than with overall fold confidence (Staszic et al.; Koes et al.).
- Inspect per-residue pLDDT in the pocket — disordered loops blocking the site (reported for some AF models) invalidate naive docking.
- Compare AF pocket to a holo homolog: subtle side-chain differences can collapse virtual-screening enrichment even when backbone RMSD looks excellent.
- Methods: cite AlphaFold DB accession, model version, rigid Vina, and that induced fit was not modeled.
- If pocket pLDDT is weak, state that predictions are unreliable — do not claim “strong binding” from score alone.
When an experimental crystal structure is effectively required
- Rubric explicitly requires X-ray or cryo-EM coordinates (no AlphaFold).
- You claim publication-level structural biology or ligand pose validation without experiment.
- Large induced-fit or domain motion is central to the hypothesis — rigid Vina on one static frame is insufficient; say so.
- Binding site sits in a low-confidence predicted or unresolved region.
- Regulatory or industrial QSAR where receptor provenance is audited.
Methods paragraphs you can adapt
The holo receptor was prepared from PDB entry 1HSG (HIV-1 protease, X-ray, 2.0 Å) with protonation at pH 7.4. The AutoDock Vina search space was centered on the co-crystallized inhibitor used to define the active site. The native ligand was redocked to verify pose recovery (top pose RMSD ≤ 2.0 Å vs crystal coordinates). Analogs were docked with a rigid receptor; exhaustiveness 8; top poses ranked by Vina affinity (kcal/mol).
No holo structure was available for [target]. The apo receptor PDB [ID] was used; the binding site was defined by residues [list] according to [Author, Year]. Pocket placement was verified by visual inspection. Results are presented as computational hypotheses because apo conformations may not represent ligand-bound side-chain conformations under a rigid receptor model.
The receptor model was obtained from the AlphaFold Protein Structure Database ([accession], model v2). Per-residue pLDDT in the binding site ranged from [min] to [max]. Rigid AutoDock Vina docking was performed without induced-fit sampling.
Troubleshooting: structure choice vs software bugs
| Symptom | Likely structure issue | What to try |
|---|---|---|
| All poses outside pocket | Box off-target or apo pocket closed | Recenter on co-crystal or literature site; switch to holo PDB |
| Redock RMSD > 5 Å on holo | Wrong ligand protonation/tautomers or box too small/large | Review setup; increase exhaustiveness to 8+; verify ligand matches crystal chemistry |
| Good redock, nonsense analog poses | Analogs too large/branched for pocket | Check strain, 2D interactions, not just affinity rank |
| Only apo structures exist | Induced fit | Cite apo limitation; consider homolog holo for box definition |
| AF model “looks fine” but poor poses | Pocket loop or side-chain error | Compare binding-site pLDDT; overlay holo homolog in PyMOL/ChimeraX |
References & further reading
- Gunaydin H, Atilgan AR. Holo protein conformation generation from apo structures by ligand binding site refinement. J. Chem. Inf. Model. 2022. doi:10.1021/acs.jcim.2c00895
- Kontoyianni M, et al. Recipes for the selection of experimental protein conformations for virtual screening. J. Chem. Inf. Model. 2008. PMC2811216
- Staszic P, et al. How good are AlphaFold models for docking-based virtual screening? PMC9852548
- Koes DR, et al. Evaluation of AlphaFold2 structures as docking targets. PMC9794023
- AutoDock Vina FAQ on accuracy and decoys: official FAQ
Run the workflow online (after you pick a structure)
- Download the PDB (or upload your prepared file) to Dock.
- Holo: define the box from the co-crystal ligand; apo: from literature or predicted pocket.
- Review setup (0 credits) — validate chains, protonation, and box in 3D.
- Redock when a reference ligand exists; then dock your SMILES/SDF batch.
- Download ZIP + PDF for tables and figures in your report.
Next reads: receptor and ligand preparation · step-by-step docking · interpreting affinity and poses.