NLP PDF document similarity
Project detail
Hi,
I have some pdf contracts, which has already embedded.
I need visualizations in streamlit or similar:
Process is:
1. Upload new doc or text (PDF).
2. Document level (full doc) similarity visualisation including similarity metric slider adjustment.
2a. Show top 10 similar documents (In table or similar)
3. If document is not very similar to the existing show that with text, red flag or similar.
Extra
4. On section level (i.e. headlines) in document
4a. Show visualization in some format that makes sense
4b. Show top 10 similar sections (in table or similar)
4c. Show top 10 NOT similar sections (in table or similar)
4d. If section is not very similar to the existing show that with text, red flag or similar.
5. Highlight sentences in the uploaded text/pdf/sections that are i.e. uncommon, seldom, not seen too often in
I can send a few demos for inspiration. Items is in prioritized order.