PDB批量下载官方脚本

📅 2026/7/2 1:22:26
PDB批量下载官方脚本
1、RCSB-PDB对晶体结构编号进行了更新在官网首页提供了归档数据下载的链接2、官方提供了批量下载的脚本支持多种类型批量下载、端点重连功能十分方便This script is for downloading all released PDB entries (of a single file type/format) from the PDB Beta Archive. It uses asynchronous aiohttp library to download multiple files asynchronously when performing bulk downloads. It requires python 3.8 or higher and aiofiles, aiohttp packages. The aiofiles, aiohttp packages can be installed with the following commands: pip install aiofiles pip install aiohttp The script requires two input arguments to run. The following example command line downloads all mmCIF files and stores the downloaded files under the directory, /home/my_user_id/download: python BetaArchiveBatchDownloader.py --file_type mmcif --output_dir /home/my_user_id/download (Run the following command lines to see all supported download file types: python BetaArchiveBatchDownloader.py or python BetaArchiveBatchDownloader.py -h or python BetaArchiveBatchDownload.py --help It shows: --file_type FILE_TYPE The supported file types for downloading are listed in left column. The corresponding file naming conventions are listed in right column. mmcif : pdb_xxxxxxxx.cif.gz pdb : pdb_xxxxxxxx.pdb.gz assemblies : pdb_xxxxxxxx-assembly#.cif.gz XML : pdb_xxxxxxxx.xml.gz XML-extatom : pdb_xxxxxxxx-extatom.xml.gz XML-noatom : pdb_xxxxxxxx-noatom.xml.gz structure_factors : pdb_xxxxxxxx-sf.cif.gz nmr_data_str : pdb_xxxxxxxx_nmr-data.str.gz nmr_data_nef : pdb_xxxxxxxx_nmr-data.nef.gz nmr_chemical_shifts : pdb_xxxxxxxx_cs.str.gz nmr_restraints : pdb_xxxxxxxx.mr.gz nmr_restraints_v2 : pdb_xxxxxxxx_mr.str.gz validation_cif : pdb_xxxxxxxx_validation.cif.gz validation_xml : pdb_xxxxxxxx_validation.xml.gz validation_pdf : pdb_xxxxxxxx_validation.pdf.gz full_validation_pdf : pdb_xxxxxxxx_full_validation.pdf.gz ) How the downloaded files are stored: Since the current Archive has more than 246000 entries, it is not desirable to have quarter million files under a single directory. The script first creates a top sub directory using file type name as sub directory name (/home/my_user_id/download/mmcif), then creates the hash directories based on pdb ids. The downloaded files are stored in hash directories based on pdb ids. For the above example command, the downloaded files are stored as following: /home/my_user_id/download/mmcif/00/pdb_0000100d.cif.gz /home/my_user_id/download/mmcif/00/pdb_0000200d.cif.gz /home/my_user_id/download/mmcif/00/pdb_0000200l.cif.gz /home/my_user_id/download/mmcif/00/pdb_0000300d.cif.gz /home/my_user_id/download/mmcif/00/pdb_0000400d.cif.gz /home/my_user_id/download/mmcif/01/pdb_0000101d.cif.gz /home/my_user_id/download/mmcif/01/pdb_0000101m.cif.gz /home/my_user_id/download/mmcif/01/pdb_0000201d.cif.gz /home/my_user_id/download/mmcif/01/pdb_0000201l.cif.gz /home/my_user_id/download/mmcif/01/pdb_0000301d.cif.gz /home/my_user_id/download/mmcif/01/pdb_0000401d.cif.gz3、本地化下载需要对脚本进行修改脚本下载地址https://cdn.rcsb.org/wwpdb/docs/BetaArchiveBatchDownloader.py#row99在aiohttp.ClientSession中增加trust_envTrue async with aiohttp.ClientSession(trust_envTrue) as session: #row270 在aiohttp.ClientSession中增加trust_envTrue async with aiohttp.ClientSession(trust_envTrue) as session: