How to Build Multimodal Document RAG with Llama 3.2 Vision and ColQwen2