Development a multi-modal for text and image for the dataset I provide
Project detail
I am looking for a developer who can help me develop a multi-modal system for text and image using the dataset I provide. The ideal candidate should be proficient in Python programming language. (Ipython notebook)
The dataset I am providing includes both text and images. Therefore, the developer should be well-versed in handling both types of data.
I am looking for a high accuracy model, so the developer should have experience in developing models that perform with high accuracy. ( computational costs details, performance metrics should be provided)
Overall, the developer should have experience in developing multi-modal systems, handling both text and image data, and achieving high accuracy in their models.