This direction studies how large language models and deep learning models internally represent knowledge, perform reasoning, and produce outputs, aiming to explain why a model makes a particular prediction or generation. Topics include neurons, attention, activation representations, causal intervention, and model behavior analysis.
The Last Update Time : ..