大庆疫情最新消息_html5开发_网络推广是什么工作内容_百度热搜风云榜

序言

今天作为开篇是极好的，因此会长一些。其中一个原因是，今天可以算是为下半年所有比赛拉开帷幕（其实我到现在一场个人比赛都没报上）。

最万众瞩目的自然是衡水湖马拉松，作为著名的PB赛道，沿湖一圈，几乎没有任何爬升，许多高手都把PB压在了这一场比赛上，赛前甚至有预测将会有2-4人打破国家纪录。

现实是国内第一丰配友2小时10分11秒，与上半年何杰创造的206相去甚远，即便是国际第一的老黑也只是快了不到20秒，何杰本人也只跑了232（据说是给别人做私兔），顺子和李芷萱退赛，女子方面就没有任何看点了。其余各路高手大多未能如愿，魔都陈龙，经过数月高原训练，梦想达标健将，最终233铩羽而归，而何杰也处于非赛季232，不过牟振华（半个校友）跑出惊人的218，一跃达标健将。

身边的熟人，大多没有跑好，不过小严跑出249，均配4分整（上半年PB302，这次算是大幅PB）；Jai哥253，他说自己没认真跑，给不少人拍了视频，肯定是中途感觉身体状态不足以PB（目前PB是248），提前收手，因为对于他这样的严肃跑者，还是在这么关键的比赛中，如果能PB一定是不会放过的。

究其根本，还是气温高了一些。另外，练得好，不如休得好，说实话，小严这次跑进250挺刺激我，毕竟在129训练时也算是跟他55开，他的训练模式也跟我很像，大多是节奏拉练和强度间歇，月跑量200K左右。所以，我想是否也该试着冲一冲250，但是这对于首马来说太冒险了，担心如果这么激进，或许最终连破三都不得。

虞山那边，五哥和五嫂分列35km组的男子亚军和女子冠军（406和430，不过这对于五哥来说肯定是没用力，要知道去年柴古的55km组，他可是跑出惊人的536，平均每小时10km的恐怖速度），真是模范夫妇，去年港百两人也都是前十。芹菜女子第17（553），不算很快，SXY（748）比军师（904）居然快了有一个多小时，就事论事，长距离的耐力女性本身确实要强于男性的，就像这次20km组女子第一（232）居然比男子第一（230）还要快，不夸张地说，我去参加都能轻松夺冠。

另外今天的515的卡位接力赛，嘉伟两组5000米，第一组17分41秒，第二组18分15秒，在今天这种天气下能连续跑出两段这样的成绩，说明他最近状态依然保持得很好，他最近很忙，其实算是他推了高百队长的位置，我才得以接任，否则是轮不到我带队的。这么看，下周末的耐克精英接力赛我是压力山大，嘉伟打头阵，我压轴收尾，可不能太拖他后腿，其实我也不知道自己现在5000米到底能跑到什么水平，或许18分半，或许能跑进18分也说不定。

我不知道，就像我也不知道今年的尾声上，还能否完成年初时的愿望。想要在高百总决赛把16km跑进1小时，以及首马破三，乃至250，很难，但并非不可能。我已经等了太久，也没有更多的时间再去等待。

Last dance, I pray

文章目录

序言
20240922

20240922

easyqa（Extractive + Genertive + MultipleChoice × Dataset + Model）：

Dataset

# -*- coding: utf-8 -*- 
# @author : caoyang
# @email: caoyang@stu.sufe.edu.cnimport os
import torch
import loggingfrom src.base import BaseClassclass BaseDataset(BaseClass):dataset_name = Nonechecked_data_dirs = []batch_data_keys = []def __init__(self, data_dir, **kwargs):super(BaseDataset, self).__init__(**kwargs)self.data_dir = data_dirself.check_data_dir()@classmethoddef generate_model_inputs(cls, batch, tokenizer, **kwargs):raise NotImplementedError()# Generator to yield batch datadef yield_batch(self, **kwargs):raise NotImplementedError()# Check files and directories of datasetsdef check_data_dir(self):logging.info(f"Check data directory: {self.data_dir}")if self.checked_data_dirs:for checked_data_dir in self.checked_data_dirs:if os.path.exists(os.path.join(self.data_dir, checked_data_dir)):logging.info(f"√ {checked_data_dir}")else:logging.warning(f"× {checked_data_dir}")else:logging.warning("- Nothing to check!")# Check data keys in yield batch# @param batch: @yield of function `yield_batch`def check_batch_data_keys(self, batch):for key in self.batch_data_keys:assert key in batch[0], f"{key} not found in yield batch"class ExtractiveDataset(BaseDataset):dataset_name = "Extractive"batch_data_keys = ["context",	# List[Tuple[Str, List[Str]]], i.e. List of [title, article[sentence]]"question",	# Str"answers",	# List[Str]"answer_starts",	# List[Int]"answer_ends",	# List[Int]]def __init__(self, data_dir, **kwargs):super(ExtractiveDataset, self).__init__(data_dir, **kwargs)# Generate inputs for different models# @param batch: @yield of function `yield_batch`# @param tokenizer: Tokenizer object# @param model_name: See `model_name` of CLASS defined in `src.models.extractive`	@classmethoddef generate_model_inputs(cls,batch,tokenizer,model_name,**kwargs,):if model_name == "deepset/roberta-base-squad2":# Unpack keyword argumentsmax_length = kwargs.get("max_length", 512)# Generate batch inputsbatch_inputs = list()contexts = list()questions = list()for data in batch:context = str()for title, sentences in data["context"]:# context += title + '\n'context += '\n'.join(sentences) + '\n'contexts.append(context)questions.append(data["question"])# Note that here must be question_first, this is determined by `tokenizer.padding_side` ("right" or "left", default "right")# See `QuestionAnsweringPipeline.preprocess` in ./site-packages/transformers/pipelines/question_answering.py for detailsmodel_inputs = tokenizer(questions,contexts,add_special_tokens = True,max_length = max_length,padding = "max_length",truncation = True,return_overflowing_tokens = False,return_tensors = "pt",) 	# Dict[input_ids: Tensor(batch_size, max_length),#	   attention_mask: Tensor(batch_size, max_length)]else:raise NotImplementedError(model_name)return model_inputsclass GenerativeDataset(BaseDataset):dataset_name = "Generative"batch_data_keys = ["context",	# List[Tuple[Str, List[Str]]], i.e. List of [title, article[sentence]]"question",	# Str"answers",	# List[Str]]def __init__(self, data_dir, **kwargs):super(GenerativeDataset, self).__init__(data_dir, **kwargs)# Generate inputs for different models# @param batch: @yield of function `yield_batch`# @param tokenizer: Tokenizer object# @param model_name: See `model_name` of CLASS defined in `src.models.generative`	@classmethoddef generate_model_inputs(cls,batch,tokenizer,model_name,**kwargs,):NotImplementedmodel_inputs = Nonereturn model_inputs			class MultipleChoiceDataset(BaseDataset):dataset_name = "Multiple-choice"batch_data_keys = ["article",	# Str, usually"question",	# Str"options",	# List[Str]"answer",	# Int]def __init__(self, data_dir, **kwargs):super(MultipleChoiceDataset, self).__init__(data_dir, **kwargs)# Generate inputs for different models# @param batch: @yield of function `yield_batch`# @param tokenizer: Tokenizer object# @param model_name: See `model_name` of CLASS defined in `src.models.multiple_choice`@classmethoddef generate_model_inputs(cls,batch,tokenizer,model_name,**kwargs,):if model_name == "LIAMF-USP/roberta-large-finetuned-race":# Unpack keyword argumentsmax_length = kwargs.get("max_length", 512)# Generate batch inputsbatch_inputs = list()for data in batch:# Unpack dataarticle = data["article"]question = data["question"]option = data["options"]flag = question.find('_') == -1choice_inputs = list()for choice in option:question_choice = question + ' ' + choice if flag else question.replace('_', choice)inputs = tokenizer(article,question_choice,add_special_tokens = True,max_length = max_length,padding = "max_length",truncation = True,return_overflowing_tokens = False,return_tensors = None,	# return list instead of pytorch tensor, for concatenation)	# Dict[input_ids: List(max_length, ),#	   attention_mask: List(max_length, )]choice_inputs.append(inputs)batch_inputs.append(choice_inputs)# InputIds and AttentionMaskinput_ids = torch.LongTensor([[inputs["input_ids"] for inputs in choice_inputs] for choice_inputs in batch_inputs])attention_mask = torch.LongTensor([[inputs["attention_mask"] for inputs in choice_inputs] for choice_inputs in batch_inputs])model_inputs = {"input_ids": input_ids,	# (batch_size, n_option, max_length)"attention_mask": attention_mask,	# (batch_size, n_option, max_length)}elif model_name == "potsawee/longformer-large-4096-answering-race":# Unpack keyword argumentsmax_length = kwargs["max_length"]# Generate batch inputsbatch_inputs = list()for data in batch:# Unpack dataarticle = data["article"]question = data["question"]option = data["options"]article_question = [f"{question} {tokenizer.bos_token} article"] * 4# Tokenizationinputs = tokenizer(article_question,option,max_length = max_length,padding = "max_length",truncation = True,return_tensors = "pt",) 	# Dict[input_ids: Tensor(n_option, max_length),#	   attention_mask: Tensor(n_option, max_length)]batch_inputs.append(inputs)# InputIds and AttentionMaskinput_ids = torch.cat([inputs["input_ids"].unsqueeze(0) for inputs in batch_inputs], axis=0)attention_mask = torch.cat([inputs["attention_mask"].unsqueeze(0) for inputs in batch_inputs], axis=0)model_inputs = {"input_ids": input_ids,	# (batch_size, n_option, max_length)"attention_mask": attention_mask,	# (batch_size, n_option, max_length)}else:raise NotImplementedError(model_name)return model_inputs

Model

# -*- coding: utf-8 -*- 
# @author : caoyang
# @email: caoyang@stu.sufe.edu.cnimport torch
import string
import loggingfrom src.base import BaseClass
from src.datasets import (ExtractiveDataset,GenerativeDataset,MultipleChoiceDataset,RaceDataset,DreamDataset,SquadDataset,HotpotqaDataset,MusiqueDataset,TriviaqaDataset)
from transformers import AutoTokenizer, AutoModelclass BaseModel(BaseClass):Tokenizer = AutoTokenizerModel = AutoModeldef __init__(self, model_path, device, **kwargs):super(BaseModel, self).__init__(**kwargs)self.model_path = model_pathself.device = device# Load model and tokenizerself.load_tokenizer()self.load_vocab()self.load_model()# Load tokenizerdef load_tokenizer(self):self.tokenizer = self.Tokenizer.from_pretrained(self.model_path)# Load pretrained modeldef load_model(self):self.model = self.Model.from_pretrained(self.model_path).to(self.device)# Load vocabulary (in format of Dict[id: token])def load_vocab(self):self.vocab = {token_id: token for token, token_id in self.tokenizer.get_vocab().items()}class ExtractiveModel(BaseModel):def __init__(self, model_path, device, **kwargs):super(ExtractiveModel, self).__init__(model_path, device, **kwargs)# @param batch: @yield in function `yield_batch` of Dataset object# @return batch_start_logits: FloatTensor(batch_size, max_length)# @return batch_end_logits: FloatTensor(batch_size, max_length)# @return batch_predicts: List[Str] with length batch_sizedef forward(self, batch, **kwargs):model_inputs = self.generate_model_inputs(batch, **kwargs)for key in model_inputs:model_inputs[key] = model_inputs[key].to(self.device)model_outputs = self.model(**model_inputs)# 2024/09/13 11:08:21# Note: Skip the first token <s> or [CLS] in most situationbatch_start_logits = model_outputs.start_logits[:, 1:]batch_end_logits = model_outputs.end_logits[:, 1:]batch_input_ids = model_inputs["input_ids"][:, 1:]del model_inputs, model_outputsbatch_size = batch_start_logits.size(0)batch_predicts = list()batch_input_tokens = list()for i in range(batch_size):start_index = batch_start_logits[i].argmax().item()end_index = batch_end_logits[i].argmax().item()input_ids = batch_input_ids[i]input_tokens = list(map(lambda _token_id: self.vocab[_token_id.item()], input_ids))predict_tokens = list()for index in range(start_index, end_index + 1):predict_tokens.append((index, self.vocab[input_ids[index].item()]))# predict_tokens.append(self.vocab[input_ids[index].item()])batch_predicts.append(predict_tokens)batch_input_tokens.append(input_tokens)return batch_start_logits, batch_end_logits, batch_predicts, batch_input_tokens# Generate model inputs# @param batch: @yield in function `yield_batch` of Dataset objectdef generate_model_inputs(self, batch, **kwargs):return ExtractiveDataset.generate_model_inputs(batch = batch,tokenizer = self.tokenizer,model_name = self.model_name,**kwargs,)# Use question-answering pipeline provided by transformers# See `QuestionAnsweringPipeline.preprocess` in ./site-packages/transformers/pipelines/question_answering.py for details# @param context: Str / List[Str] (batch)# @param question: Str / List[Str] (batch)# @return pipeline_outputs: Dict[score: Float, start: Int, end: Int, answer: Str]def easy_pipeline(self, context, question):# context = """Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy"."""# question = """When did Beyonce start becoming popular?"""pipeline_inputs = {"context": context, "question": question}question_answering_pipeline = pipeline(task = "question-answering",model = self.model,tokenizer = tokenizer,)pipeline_outputs = question_answering_pipeline(pipeline_inputs)return pipeline_outputsclass GenerativeModel(BaseModel):def __init__(self, model_path, device, **kwargs):super(GenerativeModel, self).__init__(model_path, device, **kwargs)# @param batch: @yield in function `yield_batch` of Dataset object# @return batch_start_logits: FloatTensor(batch_size, max_length)# @return batch_end_logits: FloatTensor(batch_size, max_length)# @return batch_predicts: List[Str] with length batch_sizedef forward(self, batch, **kwargs):model_inputs = self.generate_model_inputs(batch, **kwargs)model_outputs = self.model(**model_inputs)# TODONotImplemented# Generate model inputs# @param batch: @yield in function `yield_batch` of Dataset objectdef generate_model_inputs(self, batch, **kwargs):return GenerativeDataset.generate_model_inputs(batch = batch,tokenizer = self.tokenizer,model_name = self.model_name,**kwargs,)class MultipleChoiceModel(BaseModel):def __init__(self, model_path, device, **kwargs):super(MultipleChoiceModel, self).__init__(model_path, device, **kwargs)# @param data: Dict[article(List[Str]), question(List[Str]), options(List[List[Str]])]# @return batch_logits: FloatTensor(batch_size, n_option)# @return batch_predicts: List[Str] (batch_size, )def forward(self, batch, **kwargs):model_inputs = self.generate_model_inputs(batch, **kwargs)for key in model_inputs:model_inputs[key] = model_inputs[key].to(self.device)model_outputs = self.model(**model_inputs)batch_logits = model_outputs.logitsdel model_inputs, model_outputsbatch_predicts = [torch.argmax(logits).item() for logits in batch_logits]return batch_logits, batch_predicts# Generate model inputs# @param batch: @yield in function `yield_batch` of Dataset object# @param max_length: Max length of input tokensdef generate_model_inputs(self, batch, **kwargs):return MultipleChoiceDataset.generate_model_inputs(batch = batch,tokenizer = self.tokenizer,model_name = self.model_name,**kwargs,)