News story deduplication: semantic similarity detection, story clustering, canonical version selection, syndication trac