humanmllm.github.io/index.html at main · HumanMLLM/humanmllm.github.io · GitHub

1
2
3
4
5
6
7
<!doctype html><html lang=en dir=auto><head><meta name=generator content="Hugo 0.143.1"><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots content="index, follow"><title>Human MLLM</title>
<meta name=keywords content="multi-modal,machine learning,blog"><meta name=description content="Make AI Know Humans Better"><meta name=author content="Human MLLM"><link rel=canonical href=https://humanmllm.github.io/><link crossorigin=anonymous href=/assets/css/stylesheet.687a7bdf1e62c1a0328405768b1a35f45c643e1fb80606ee01a132832cb2ceea.css integrity="sha256-aHp73x5iwaAyhAV2ixo19FxkPh+4BgbuAaEygyyyzuo=" rel="preload stylesheet" as=style><link rel=icon href=https://humanmllm.github.io/favicon.png><link rel=apple-touch-icon href=https://humanmllm.github.io/favicon.png><link rel=manifest href=https://humanmllm.github.io/site.webmanifest><meta name=theme-color content="#615CED"><link rel=alternate type=application/json href=https://humanmllm.github.io/index.json><link rel=alternate hreflang=en href=https://humanmllm.github.io/><noscript><style>#theme-toggle,.top-link{display:none}</style></noscript><script defer crossorigin=anonymous src=/js/custom.df2a5734071a3a99040f5e88e6d16d78358fbdef9a5e7389874ac5f2aa2ca86f.js integrity="sha256-3ypXNAcaOpkED16I5tFteDWPve+aXnOJh0rF8qosqG8="></script><script async src="https://www.googletagmanager.com/gtag/js?id=G-NMEMBZ8R90"></script><script>var dnt,doNotTrack=!1;if(!1&&(dnt=navigator.doNotTrack||window.doNotTrack||navigator.msDoNotTrack,doNotTrack=dnt=="1"||dnt=="yes"),!doNotTrack){window.dataLayer=window.dataLayer||[];function gtag(){dataLayer.push(arguments)}gtag("js",new Date),gtag("config","G-NMEMBZ8R90")}</script><meta property="og:title" content="Human MLLM"><meta property="og:description" content="Make AI Know Humans Better"><meta property="og:type" content="website"><meta property="og:url" content="https://humanmllm.github.io/"><meta property="og:image" content="https://humanmllm.github.io/%3Clink%20or%20path%20of%20image%20for%20opengraph,%20twitter-cards%3E"><meta property="og:site_name" content="Human MLLM"><meta name=twitter:card content="summary_large_image"><meta name=twitter:image content="https://humanmllm.github.io/%3Clink%20or%20path%20of%20image%20for%20opengraph,%20twitter-cards%3E"><meta name=twitter:title content="Human MLLM"><meta name=twitter:description content="Make AI Know Humans Better"><script type=application/ld+json>{"@context":"https://schema.org","@type":"Organization","name":"Human MLLM","url":"https://humanmllm.github.io/","description":"Human MLLM","thumbnailUrl":"https://humanmllm.github.io/favicon.ico","sameAs":[]}</script></head><body class=list id=top><script>const hasHeaderBg=!0</script><header class=header><div class="nav-container nav-background"><nav class=nav><div class=logo><a href=/ accesskey=h title="HumanMLLM (Alt + H)"></a></div><ul id=menu><li><a href=/blog/ title=Blog><span>Blog</span></a></li><li><a href=/search title="SEARCH (Alt + /)" accesskey=/><span>SEARCH&nbsp;<svg width="13" height="13" viewBox="0 0 24 24" fill="none" stroke="currentcolor" stroke-width="4" stroke-linecap="round" stroke-linejoin="round"><circle cx="11" cy="11" r="8"/><line x1="21" y1="21" x2="16.65" y2="16.65"/></svg></span></a></li></ul></nav></div></header><div class=hero-container style=min-height:100vh;display:flex;justify-content:center;background-color:#000><img class=hero-background style=opacity:0 onload="this.style.opacity=1" src=/img/background.png width=100%><div class=hero-gradient></div><div class=mouse-hint><div class=mouse-point></div></div><style>body{-ms-overflow-style:none;scrollbar-width:none}body::-webkit-scrollbar{display:none}.mouse-hint{position:absolute;height:36px;width:24px;border:1px solid #fff;border-radius:12px;bottom:20%;left:50% - calc(12px);opacity:1;transition:opacity .3s;animation:1s ease-out 0s 1 slideBelow}.mouse-hint .mouse-point{height:4px;width:4px;background-color:#fff;position:absolute;left:50%;bottom:40%;border-radius:4px;transform-origin:50% 100%;transform:translate(-50%);animation:2.2s ease-in-out infinite jump;will-change:transform}@keyframes slideBelow{0%{transform:translateY(50px);opacity:0}100%{transform:translateX(0);opacity:1}}@keyframes jump{0%,20%,60%,to{transform:translate(-50%)translateY(0);height:4px;animation-timing-function:ease-in}40%,80%{transform:translate(-50%)translateY(8px);height:8px;animation-timing-function:ease-out}}</style><div class="hero text-light text-fade-in"><div class=hero-header><h1>Human MLLM</h1></div><div class=hero-content>Make AI Know Humans Better</div><div class=hero-footer><div class=social-icons></div></div></div></div><main class="main home"><article class=post-entry><figure class=entry-cover><img loading=lazy src=https://intranetproxy.alipay.com/skylark/lark/0/2025/png/155356495/1737694579583-b3bb81f5-0533-4ceb-9e70-cbbda9c1fe43.png alt></figure><header class=entry-header><h2>LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding</h2></header><div class=entry-content><p>PAPER CODE Checkpoints Demo
Introduction We present LLaVA-Octopus, a novel video multimodal large language model. LLaVA-Octopus adaptively weights features from different visual projectors based on user instructions, enabling us to leverage the complementary strengths of each projector. We observe that different visual projectors exhibit distinct characteristics when handling specific tasks. For instance, some projectors excel at capturing static details, while others are more effective at processing temporal information, and some are better suited for tasks requiring temporal coherence. By dynamically adjusting feature weights according to user instructions, LLaVA-Octopus dynamically selects and combines the most suitable features, significantly enhancing the model’s performance in multimodal tasks. LLaVA-Octopus achieves excellent performance across multiple benchmarks, especially in tasks such as multimodal understanding, visual question answering, and video understanding, highlighting its broad application potential.
...</p></div><footer class=entry-footer><span title='2025-01-25 00:00:03 +0800 CST'>January 25, 2025</span>&nbsp;·&nbsp;3 min&nbsp;·&nbsp;467 words&nbsp;·&nbsp;Human MLLM</footer><a class=entry-link aria-label="post link to LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding" href=https://humanmllm.github.io/blog/llava-octopus/></a></article></main><footer class=footer><span>&copy; 2025 <a href=https://humanmllm.github.io/>Human MLLM</a></span>
<span>Powered by
<a href=https://gohugo.io/ rel="noopener noreferrer" target=_blank>Hugo</a></span></footer><a href=#top aria-label="go to top" title="Go to Top (Alt + G)" class=top-link id=top-link accesskey=g><svg viewBox="0 0 12 8" fill="currentcolor"><path d="M12 8H0l6-8z"/></svg>
</a><script>let menu=document.getElementById("menu");menu&&(menu.scrollLeft=localStorage.getItem("menu-scroll-position"),menu.onscroll=function(){localStorage.setItem("menu-scroll-position",menu.scrollLeft)}),document.querySelectorAll('a[href^="#"]').forEach(e=>{e.addEventListener("click",function(e){e.preventDefault();var t=this.getAttribute("href").substr(1);window.matchMedia("(prefers-reduced-motion: reduce)").matches?document.querySelector(`[id='${decodeURIComponent(t)}']`).scrollIntoView():document.querySelector(`[id='${decodeURIComponent(t)}']`).scrollIntoView({behavior:"smooth"}),t==="top"?history.replaceState(null,null," "):history.pushState(null,null,`#${t}`)})})</script><script>var mybutton=document.getElementById("top-link");window.onscroll=function(){document.body.scrollTop>800||document.documentElement.scrollTop>800?(mybutton.style.visibility="visible",mybutton.style.opacity="1"):(mybutton.style.visibility="hidden",mybutton.style.opacity="0")},mybutton.oncontextmenu=e=>{e.preventDefault(),document.querySelectorAll(".example-container").forEach(e=>{e.style.backgroundColor="unset"}),document.querySelectorAll(".example-content").forEach(e=>{e.style.display="block",e.style.backgroundColor="var(--code-bg)",e.style.marginBottom="var(--modal-gap)"}),document.querySelectorAll(".next-button").forEach(e=>{e.style.display="none"})}</script></body></html>