Validation neutralization assays versus polyclonal fitsΒΆ

Compare actual measured neutralization values for specific mutants to the polyclonal fits.

Import Python modules:

[1]:
import os
import pickle

import altair as alt

import pandas as pd

import yaml

Read configuration and validation assay measurements:

[2]:
with open("config.yaml") as f:
    config = yaml.safe_load(f)

validation_ic50s = pd.read_csv(config["validation_ic50s"], na_filter=None)

validation_ic50s
[2]:
antibody aa_substitutions measured IC50 lower_bound
0 267C 0.000877 False
1 267C S371Y 0.001423 False
2 267C R408S 0.000574 False
3 267C R452I 0.000678 False
4 267C F456L 0.001209 False
5 267C E484A 0.003011 False
6 267C F486I 0.002161 False
7 279C 0.000057 False
8 279C S371Y 0.000181 False
9 279C R408S 0.000045 False
10 279C R452I 0.000038 False
11 279C F456L 0.000074 False
12 279C E484A 0.000112 False
13 279C F486I 0.000125 False

Now get the predictions by the averaged polyclonal model fits:

[3]:
validation_vs_prediction = []
for antibody, antibody_df in validation_ic50s.groupby("antibody"):
    with open(os.path.join(config["escape_dir"], f"{antibody}.pickle"), "rb") as f:
        model = pickle.load(f)
    validation_vs_prediction.append(model.icXX(antibody_df))

validation_vs_prediction = pd.concat(validation_vs_prediction, ignore_index=True)

validation_vs_prediction
[3]:
antibody aa_substitutions measured IC50 lower_bound mean_IC50 median_IC50 std_IC50 n_models frac_models
0 267C 0.000877 False 0.034641 0.034711 0.034816 4 1.0
1 267C E484A 0.003011 False 0.459317 0.387717 0.473481 4 1.0
2 267C F456L 0.001209 False 1.323086 0.695120 1.754687 4 1.0
3 267C F486I 0.002161 False 0.548088 0.264798 0.718249 4 1.0
4 267C R408S 0.000574 False 0.011290 0.011112 0.008520 4 1.0
5 267C R452I 0.000678 False 0.006932 0.006755 0.004558 4 1.0
6 267C S371Y 0.001423 False 0.934559 0.418290 1.359847 4 1.0
7 279C 0.000057 False 0.006658 0.006468 0.004207 4 1.0
8 279C E484A 0.000112 False 0.019545 0.013754 0.019474 4 1.0
9 279C F456L 0.000074 False 0.007652 0.007338 0.005839 4 1.0
10 279C F486I 0.000125 False 0.007503 0.007076 0.004314 4 1.0
11 279C R408S 0.000045 False 0.006452 0.005207 0.004716 4 1.0
12 279C R452I 0.000038 False 0.000839 0.000766 0.000438 4 1.0
13 279C S371Y 0.000181 False 0.019463 0.019499 0.012934 4 1.0

Now plot the results. We will plot the median across the replicate polyclonal fits to different deep mutational scanning replicates. This is an interactive plot that you can mouse over for details:

[4]:
corr_chart = (
    alt.Chart(validation_vs_prediction)
    .encode(
        x=alt.X("measured IC50", scale=alt.Scale(type="log")),
        y=alt.Y(
            "median_IC50",
            title="predicted IC50 from DMS",
            scale=alt.Scale(type="log"),
        ),
        facet=alt.Facet("antibody", columns=4, title=None),
        color=alt.Color("lower_bound", title="lower_bound"),
        tooltip=[
            alt.Tooltip(c, format=".3g") if validation_vs_prediction[c].dtype == float
            else c
            for c in validation_vs_prediction.columns.tolist()
        ],
    )
    .mark_circle(filled=True, size=60, opacity=0.6)
    .configure_axis(grid=False)
    .resolve_scale(y="independent", x="independent")
    .properties(width=150, height=150)
)

corr_chart
/fh/fast/bloom_j/computational_notebooks/bdadonai/2022/vep_dms/SARS-CoV-2_Delta_spike_DMS_REGN10933/.snakemake/conda/7c022d2d81458b7fb39e0b59857b3086_/lib/python3.9/site-packages/altair/utils/core.py:317: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
  for col_name, dtype in df.dtypes.iteritems():
[4]:
[ ]: