AlphaFold3 data_modules 模块的 OpenFoldDataset
类的 __init__
方法用于初始化 OpenFoldDataset
,它管理多个蛋白质数据集,并在训练时进行加权采样和随机过滤。
源代码:
def __init__(self,datasets: Union[Sequence[OpenFoldSingleDataset], Sequence[OpenFoldSingleMultimerDataset]],probabilities: Sequence[float],epoch_len: int,generator: torch.Generator = None,_roll_at_init: bool = True,):self.datasets = datasetsself.probabilities = probabilitiesself.epoch_len = epoch_lenself.generator = generatorself.datapoints = Noneself._samples = [self.looped_samples(i) for i in range(len(self.datasets))]if _roll_at_init:self.reroll()
源码解读:
方法签名
def __init__(self,datasets: Union[Sequence[OpenFoldSingleDataset], Sequence[OpenFoldSingleMultimerDataset]],probabilities: Sequence[float],epoch_len: int,generator: torch.Generator = None,_roll_at_init: bool = True,):
这个构造方法的主要作用是:
-
接受多个数据集(单链蛋白或多链蛋白的数据集)。
-
为每个数据集设