待完成

  1. 源码细节整理

torch.inference_mode()

with no_gradient的一种加速 参考文档

nn.MarginRankingLoss()

文档 margin = 0 x1大于x2 则去-y,viceversa 取 y

*loss(x1,x2,y)=max(0,−y∗(x1−x2)+margin)*

这里最后的loss是平均后的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
loss = nn.MarginRankingLoss()
input1 = torch.randn(3, requires_grad=True)
input2 = torch.randn(3, requires_grad=True)
target = torch.randn(3).sign()
output = loss(input1, input2, target)
output.backward()

```
input1, input2, target, output

(tensor([ 0.0277, -0.3806, 1.0405], requires_grad=True),
tensor([-0.9075, 0.3271, 0.1156], requires_grad=True),
tensor([ 1., -1., -1.]),
tensor(0.3083, grad_fn=<MeanBackward0>))

input1 - input2 , (input1 - input2) * (-target)

(tensor([ 0.9352, -0.7077, 0.9249], grad_fn=<SubBackward0>),
tensor([-0.9352, -0.7077, 0.9249], grad_fn=<MulBackward0>),

loss = 0.9249/3

```

gc.collect()

清除内存

defaultdict

获得创建key不给value也不报错的dict

1
2
3
4
5
from collections import defaultdict

history = defaultdict(list)

history['Train Loss'].append(1.1)

StratifiedKFold()

1
2
3
4
5
6
7
8
from sklearn.model_selection import StratifiedKFold, KFold

skf = StratifiedKFold(n_splits=CONFIG['n_fold'], shuffle=True, random_state=CONFIG['seed'])

for fold, ( _, val_) in enumerate(skf.split(X=df, y=df.worker)):
df.loc[val_ , "kfold"] = int(fold)

df["kfold"] = df["kfold"].astype(int)

第五行 将X分k折,y标签为样本对应index,fold 在 0~5

得到df[“kfold”] 列包含 属于第几折的 valid数据

通过下面的函数直接选择非本折的数据作为train,其他的就是valid

df_train = df[df.kfold != fold].reset_index(drop=True) df_valid = df[df.kfold == fold].reset_index(drop=True)

1
2
3
4
5
6
7
8
9
10
11
12
13
def prepare_loaders(fold):
df_train = df[df.kfold != fold].reset_index(drop=True)
df_valid = df[df.kfold == fold].reset_index(drop=True)

train_dataset = JigsawDataset(df_train, tokenizer=CONFIG['tokenizer'], max_length=CONFIG['max_length'])
valid_dataset = JigsawDataset(df_valid, tokenizer=CONFIG['tokenizer'], max_length=CONFIG['max_length'])

train_loader = DataLoader(train_dataset, batch_size=CONFIG['train_batch_size'],
num_workers=2, shuffle=True, pin_memory=True, drop_last=True)
valid_loader = DataLoader(valid_dataset, batch_size=CONFIG['valid_batch_size'],
num_workers=2, shuffle=False, pin_memory=True)

return train_loader, valid_loade

tqdm

1
bar = tqdm(enumerate(dataloader), total=len(dataloader))

单个epoch下面对bar做如下设置

1
2
bar.set_postfix(Epoch=epoch, Valid_Loss=epoch_loss,
LR=optimizer.param_groups[0]['lr'])

Weights & Biases (W&B)

  • hash 一个项目id

  • train valid 定义一个 1个epoch 的函数 返回 分别其中的loss

  • wandb.log({“Train Loss”: train_epoch_loss}) 使用 log 方式记录 损失函数

  • run = wandb.init(project='Jigsaw', 
                         config=CONFIG,
                         job_type='Train',
                         group=CONFIG['group'],
                         tags=['roberta-base', f'{HASH_NAME}', 'margin-loss'],
                         name=f'{HASH_NAME}-fold-{fold}',
                         anonymous='must')
    
    1
    TRAIN PART
    run.finish()
    1
    2
    3
    4
    显示如下
    'hash--------name'
    Syncing run k5nu8k69390a-fold-0 to Weights & Biases (docs).

流程训练提炼

  • for fold in range(0, CONFIG[‘n_fold’])
  • wandb.init
  • prepare_loaders、fetch_scheduler
  • run_training
    • train_one_epoch、valid_one_epoch —-> to got model, loss for wandb

中间掺杂 W&B 的数据实时载入分析即可

df[‘y’].value_counts(normalize=True) to got the percentage of each values

原文链接