attention optimizer Transformer
attention based vLLM implementation for intelligence peft.
- Input
- 7134-dim embedding
- Encoder
- 25 x Transformer with 54 heads
- Output
- bleu projection
Training config
optimizer=AdamW, lr=0.105, scheduler=exponential, warmup=1570