PolySAE: Unveiling Hidden Interactions in Neural Networks
PolySAE challenges the assumptions behind sparse autoencoders, offering a nuanced approach to feature interactions. It not only improves performance metrics but also reveals the complex layers within language models.