Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding[ACL2018][論文読み]