concepts ยท week 1
Test what stuck from this story.
๐ story quiz
In self attention, what does a token's Query vector represent?
Why are attention scores passed through a softmax before they weight the Value vectors?
Why does self attention scale poorly as the input gets longer?
score: 0/0
Re-read the story โ