Generate a file with all age cohort escape data from A/Hong Kong/45/2019 H3 HA¶
This file is filtered using the following parameters, specified in data/polyclonal_config.yaml: * Functional effect threshold (-1.38) * Minimum times seen (n=3) * Allowed amino acids (all except stop codons)
[1]:
import altair as alt
import pandas as pd
import polyclonal
import yaml
Read the data and get config parameters
[2]:
with open('data/polyclonal_config.yaml') as f:
config = yaml.safe_load(f)['overall_default']['plot_kwargs']
func_effect = config['addtl_slider_stats']['functional effect']
times_seen = config['addtl_slider_stats']['times_seen']
aa_list = config['alphabet']
Get functional effects
[3]:
muteffects_csv = "results/muteffects_functional/muteffects_observed.csv"
muteffects = pd.read_csv(muteffects_csv).rename(
columns={"reference_site": "site", "effect": "functional effect"}
)[["site", "mutant", "functional effect"]]
Define samples in each age cohort
[4]:
cohort_dict = {
'2-5_years': [
'3944',
'2389',
'2323',
'2388',
'3973',
'4299',
'4584',
'2367',
],
'15-20_years': [
'2350',
'2365',
'2382',
'3866',
'2380',
'3856',
'3857',
'3862'
],
'40-45_years': [
'33C',
'34C',
'197C',
'199C',
'215C',
'210C',
'74C',
'68C',
'150C',
'18C',
],
'infant': [
'2462',
],
'68_years': [
'AUSAB-13',
]
}
Read the library-averaged escape dfs for each serum, filter by defined parameters, and combine to one summary escape file.
[5]:
escape_df_list = []
for cohort, serum_list in cohort_dict.items():
for serum in serum_list:
df = (pd.read_csv(f'results/antibody_escape/{serum}_avg.csv')
.query(f"`times_seen` >= @times_seen")
.query("`mutant` in @aa_list")
.merge(muteffects,
how='left',
on=['site', 'mutant']
)
.query("`functional effect` >= @func_effect")
)
df['serum'] = serum
df['cohort'] = cohort
# drop extraneous columns
df = df.drop(['epitope', 'escape_median', 'escape_min_magnitude'], axis=1)
escape_df_list.append(df)
escape_df = pd.concat(escape_df_list)
[6]:
output_csv = 'results/full_hk19_escape_scores.csv'
print(f'Writing to {output_csv}')
escape_df.to_csv(output_csv, index=False)
escape_df
Writing to results/full_hk19_escape_scores.csv
[6]:
| site | wildtype | mutant | mutation | escape_mean | escape_std | n_models | times_seen | frac_models | functional effect | serum | cohort | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -2 | D | G | D-2G | 0.1278 | 0.0473 | 2 | 3.0 | 1.0 | -0.6452 | 3944 | 2-5_years |
| 1 | -2 | D | Y | D-2Y | 0.0338 | 0.0501 | 2 | 7.0 | 1.0 | -0.7111 | 3944 | 2-5_years |
| 2 | 1 | Q | H | Q1H | 0.0069 | 0.1994 | 2 | 3.0 | 1.0 | -0.1690 | 3944 | 2-5_years |
| 3 | 1 | Q | R | Q1R | -0.0235 | 0.1103 | 2 | 5.0 | 1.0 | -0.6300 | 3944 | 2-5_years |
| 4 | 2 | K | N | K2N | -0.0178 | 0.0990 | 2 | 5.0 | 1.0 | -0.1303 | 3944 | 2-5_years |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2765 | 538 | A | T | A538T | -0.0144 | 0.0011 | 2 | 3.0 | 1.0 | -0.0882 | AUSAB-13 | 68_years |
| 2766 | 538 | A | V | A538V | -0.0129 | 0.0038 | 2 | 9.5 | 1.0 | -0.4286 | AUSAB-13 | 68_years |
| 2767 | 540 | Q | H | Q540H | 0.0298 | 0.0467 | 2 | 5.0 | 1.0 | -0.1289 | AUSAB-13 | 68_years |
| 2768 | 540 | Q | K | Q540K | -0.0069 | 0.0640 | 2 | 8.5 | 1.0 | -0.6159 | AUSAB-13 | 68_years |
| 2769 | 540 | Q | R | Q540R | -0.0347 | 0.0355 | 2 | 8.5 | 1.0 | -1.3188 | AUSAB-13 | 68_years |
69633 rows × 12 columns