Parse any public HTML or PDF URL into clean structured JSON — auto-detects document type (article, invoice, research, generic)