Responsive rendering of Chirpy theme on multiple devices.

Text and Typography

Examples of text, typography, math equations, diagrams, flowcharts, pictures, videos, and more.

Aug 8, 2019 Text, Demo

Deployment and Principles of LLaVa

Reference Principle and Deployment A video concerning the principles of LLaVa Good to know Project structure and config.json settings Download the llava framework, download the weight and v...

Apr 18, 2025 LLM, Multi-modal

Huggingface Cli

Install pip install -U huggingface_hub # Python>=3.8 Login huggingface-cli login ## get the token from website Download Models huggingface-cli download --resume-download {model name from ...

Apr 16, 2025 LLM, Memo

Blip in details

Blip架构（1）Image Encoder 是干嘛的？就是把图片变成一串“数字表达”，类似我们读一本书时把每个字变成我们能理解的意思。它用的是像 ViT 这种模型，把图片像切豆腐块一样切成很多小块（patch），然后变成一个个“图像token”。最终它会变成一个形状像 (图片数, token数, 每个token的维度) 的东西，就像 NLP 中的 (batch...

Apr 15, 2025 LLM, Multi-modal

ALBEF in details

参考 https://zhuanlan.zhihu.com/p/619501914 经验主义融合编码器不能太简单图像编码器要比文本编码器大一些 🔧 ALBEF 的核心思想：Align Before Fuse 传统图文模型（如 UNITER）是“先融合后对齐”的：先将图文输入一个 Transformer，然后再训练模型学习它们之间的关系。 ALBEF 的...

Apr 15, 2025 LLM, Multi-modal

Various Model Memo

Vit Vit Principle Vit Code Vit Position Encoding - Video Clip 🧠 场景：CLIP处理一句话比如我们有这句话： “a cute cat”（一只可爱的猫） CLIP 会这样处理这句话： 1. 分词 + 编码：这句话会变成一个词序列（token）： css CopyEdit ["&lt...

Apr 15, 2025 LLM, Multi-modal

MoCo in details

All generated by chatgpt-4o 🎯 问题重述：假设一个 batch 有 64 张图，MoCo 的流程是为每张图都生成 query 和 key，那怎么一起训练、一起计算 loss呢？ ✅ 回答核心： MoCo 是并行地对每张图执行“对比任务”，然后对所有样本的 loss 做平均，一起反向传播。这是现代深度学习中很常见的“mini-batch train...

Apr 15, 2025 LLM, Multi-modal

VRET

1. Overall Goal & Context Thesis Title: Feasibility Study on Using ‘Behind the Ear’-EEG to Detect Arousal in Virtual Reality Exposure Therapy Primary Objective: To investigate whether ‘Be...

Apr 14, 2025 LLM, Bio-Engineering

Towards Multimodal In-Context Learning for Vision & Language Models

Paper Link Paper Link 1. The Core Problem: VLMs Struggle with Learning from Examples Current Strength: Today’s Vision-Language Models (VLMs), like LLaVA, excel at zero-shot tasks – understandi...

Apr 13, 2025 LLM, Bio-Engineering

EEG Based Multimodal Emotion Recognition

Paper Link Paper Link I. 🌟 Core Focus and Contributions Topic: A systematic review of EEG-based Multimodal Emotion Recognition (EMER) Focus: Centers on EEG as the primary modality, combined ...

Apr 13, 2025 LLM, Bio-Engineering

Text and Typography

Deployment and Principles of LLaVa

Huggingface Cli

Blip in details

ALBEF in details

Various Model Memo

MoCo in details

VRET

Towards Multimodal In-Context Learning for Vision & Language Models

EEG Based Multimodal Emotion Recognition

Trending Tags