A curated list of resources, such as papers, tutorials, code, etc., on the topic of document similarity measures.
The goal of this repository is to provide a comprehensive overview for students and reseachers.
Dimensions of Similarity
From word to sentence level
BERT and other Transformer Language Models
Similarity / Distance Measures
Cosine similarity: Cosine-similarity treats all dimensions equally.
Manhatten distance = L1 norm (see also Manhattan LSTM)
Supervised Word Moving Distance (S-WMD)
Benchmarks & Datasets