Malaria Box supporting information

Data from the 20,000 hits

The full list of the 20,000 hits from Plasmodium falciparum (P. falciparum) whole cell screening originates from the GlaxoSmithKline Tres Cantos Antimalarial Set (TCAMS), Novartis-GNF Malaria Box Data set and St. Jude Children's Research Hospital’s Dataset available on the ChEMBL-NTD website. Additionally a small set of active compounds from commercially available libraries was added; this latter set has not previously been published. The Public Data Set (Excel file) contains the information pertaining to each compound: the index number, ChEMBL compound identity, batch number, the structure in Canonical SMILES format, reference to the original source and the EC50 in µM against P. falciparum 3D7 as reported in the ChEMBL-NTD repository.

Data on the 400 Malaria Box compounds 

All compounds in the Malaria Box have been screened in vitro against 3D7 (chloroquine (CQ) sensitive but sulfadoxine resistant strain of P. falciparum) and cytotoxicity assays were performed on human embryonic kidney cell lines (HEK-293). Data are expressed as EC50 in nM for the falciparum data and CC50 in nM, for the cytotoxicity.

The method employed is described by:

- Duffy, S. & Avery, V. M. Development and Optimisation of a Novel Anti-malarial Imaging Assay Validated for High Throughput ScreeningAm. J. Trop. Med. Hyg. 2012, 86, 84-92. DOI: 10.4269/ajtmh.2012.11-0302

Briefly, antimalarial assays were carried out on 3D7 (CQ sensitive strain but sulfadoxine resistant strain of P. falciparum), with the CQ and artemisinin included as controls. Cytotoxicity assays were performed using human embryonic kidney cells (HEK 293), with reference to the controls treated with puromycin. Malaria parasites incubated with the compounds for 72 hrs were stained with DAPI (4'-6-diamidino-2-phenylindole) and imaged using a high throughout confocal imaging system. To ascertain viable parasites, data were expressed as a numerical output based on the number and intensity of fluorescent spots, with reference to the control images obtained from P. falciparum treated with different concentrations of artemisinin and CQ.

The list of the 400 compounds in the Open Access Malaria Box (Excel file) contains the information pertaining to each compound:

  • HEOS_COMPOUND_ID (as an MMV identification number)
  • Batch_No_March2012 (batch number for 1st round of shipments)
  • Batch_No_May2012 (batch number for 2nd round of shipments)
  • Smiles (structure of the compound in a Canonical SMILES format)
  • percent_inh at 2 µM (% inhibition of 3D7 growth at 2 µM concentration of compound)
  • percent_inh at 5 µM (% inhibition of 3D7 growth at 5 µM concentration of compound)
  • EC50_nM (against P. falciparum 3D7, as reported by Prof  Avery)
  • ChEMBL_NTD_ID (compound identity as reported in the ChEMBL database)
  • source (GSK, GNF, StJude or Commercial libraries)
  • CHEMBL EC50 in µM (against P. falciparum 3D7, as reported in the ChEMBL database)
  • Set (Drug-like or Probe-like)
  • Ro5_ViolationCount (violation of Lipinski’s Rule of 5, e.g. 1 = 1 violation of Lipinski’s rule)
  • NplusO_Count (sum of Nitrogen and Oxygen atoms)
  • Molecular_Weight (in g/mol)
  • Num_H_Donors (sum of hydrogen bond donors as drawn)
  • ALogP (calculated partition coefficient)
  • Comment (if applicable)
  • Plate_March2012 (plate assignation of the compound for 1st round of shipments)
  • Well_March2012 (location on the designated plate for 1st round of shipments)
  • Plate_May2012 (plate assignation of the compound for 2nd  round of shipments)
  • Well_May2012 (location on the designated plate for 2nd round of shipments)

Important Information

MMV have attempted to include attractive starting points for drug discovery programmes for incorporation in the drug-like set.

However, access to commercially available material in sufficient scale and in a relevant time frame was a major challenge. Consequently a pragmatic approach regarding chemical quality and diversity was taken to achieve an acceptable compromise of activity against P. falciparum and chemical diversity.

Computational and non-computational tools used to elect a compound in the “Drug-like” set.

Both the REOS (Rapid Elimination Of Swill) filters and the PAINS (Pan Assay Interference Compounds) filters were used.

- Designing Screens: How to Make your Hits as Hit, Walters, W. P. and Namchuk, M. Nature Reviews Drug Discovery, 2003, 2, 259 DOI: 10.1038/nrd1063

- New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries and for Their Exclusion in Bioassays, Baell, J. B. Holloway, G. A.  J. Med. Chem. 2010, 53, 2719–2740 DOI:  10.1021/jm901137j

In the present work any compound from the ChEMBL-NTD dataset that contained one, or more, of these filters was eliminated from consideration.

Also AlogP calculations were based on Ghose-Crippen atom assignments.

- Viswanadhan, V.N., Ghose, A.K.,  Revankar, G.R. & Robins, R.K. J. Chem. Inf. Comput. Sci. 1989, 29,163. DOI: 10.1021/ci00063a006

Final inclusion/ exclusion of compounds for enhancing the chemical diversity (from those that were active and available) was reviewed through the “wisdom of crowds” methodology via tapping into the collective experience of MMV medicinal chemists.

- Library Enhancement through the Wisdom of Crowds, Hack, M. D.; , Rassokhin, D. N.;  Buyck, C.; Seierstad, M.; Skalkin, A.;  Holte, P.;  Jones,T. K.; Mirzadegan, T.;  Agrafiotis, D. Q. J. Chem. Inf. Model. 2011, 51, 3275. DOI: 10.1021/ci200446y

It is also important to note that whilst the activity reported accurately represents the data obtained in the said assays and, furthermore, the quality of each compound has been assessed, it is recommended that any active compound from a screen is re-synthesized/re-purified and retested to confirm that the activity truly resides with the stated structure and not a small percentage of an impurity.