### Papers I ReadNotes and Summaries

• The paper hypothesizes that main optimization challenges in multi-task learning arise because of negative interference between different tasks’ gradients.

• It hypothesizes that negative interference happens when:

• The gradients are conflicting (i.e., have a negative cosine similarity).

• The gradients coincide with high positive curvature.

• The difference in gradient magnitude is quite large.

• The paper proses to work around this problem by performing “gradient surgery.”

• If two gradients are conflicting, modify the gradients by projecting each onto the other’s normal plane.

• This modification is equivalent to removing the conflicting component of the gradient.

• Theoretical Analysis

• The conditions are:

• Difference in the magnitude of the gradients is sufficiently large.

• Large enough learning rate.

• Experimental Setup

• For Multi-task CIFAR-100, PCGrad is used with the shared parameters of the routing networks.

• For NYUv2, PCGrad is combined with MTAN.

• In all the cases, using PCGrad improves the performance.

• In the context of SAC, the paper suggests learning temperature $\alpha$ on a per-task basis.