- I-architecture ye-MoE esebenzayo: 28B iyonke kunye ne-~3B asethi nganye ngethokheni ene-ViT kunye nelahleko ethile ye-multimodal balance.
- Ingqiqo ye-multimodal ephezulu: RL (GSPO, IcePop), isiseko esifikelelekayo kunye "UkuCinga ngeMifanekiso" kwiinkcukacha kunye nomsila omde.
- Ukusasazwa okuguquguqukayo: BaiduI-APIs ehambelanayo, i-ERNIEKit, i-vLLM kunye nobungakanani ukuya kwii-bits ezi-2 ezineemfuno eziguquguqukayo zeVRAM.

Ileyibhile ethi "Ukucinga" ivele ngokuzolileyo kwi-ERNIE-4.5-VL yosapho luka-Baidu kwaye iye yavusa impikiswano. Phakathi kwamagqabantshintshi ukuba ukuqaliswa kwakuyimfihlo ngokupheleleyo, itshathi encinci ithelekisa nabakhuphisana nabo Gemini 2.5 Ipro kunye ne-hypothetical "high" GPT-5, kunye nesithembiso sendlela "ucinga ngemifanekiso" Kuba ayichazwanga kakuhle, abantu abaninzi bayazibuza ukuba ingaba le modeli ilungile na njengoko intengiso icebisa. Inyaniso kukuba, iinguqulelo zangaphambili ze-Ernie zazisele zikwazi, ngoko kuyafaneleka ukuba sijongisisa phantsi kwe-hood kwaye sahlule i-hype kwizinto eziyinyani.
Ngokufutshane, i-ERNIE-4.5-VL-28B-A3B-Ukucinga imodeli ye-multimodal vision-language model kunye noMxube weeNgcali (i-MoE) yokwakha esebenzayo. kuphela ~3B iiparamitha ngophawu ngalunye ngaphandle kwama-28B ewonke. Oku kuvumela ibhalansi enomdla kakhulu phakathi kwamandla kunye nokusebenza kakuhle. Ukwahluka "Ukucinga" kubandakanya uqeqesho oluphakathi olugxile kwingqiqo ye-multimodal, yomeleza ulungelelwaniso lwesemantic phakathi kwesicatshulwa kunye nomfanekiso, kwaye yongeza izicwangciso zokuqinisa ezifana ne-GSPO kunye ne-IcePop ukuzinzisa i-MoE kwimisebenzi eqinisekisiweyo, ukongeza kumsebenzi wayo odumileyo "wokucinga ngemifanekiso" odibanisa ukusondeza kunye ukukhangela okubonakalayo ukufumana iinkcukacha ezintle kunye nolwazi olude.
Yintoni i-ERNIE-4.5-VL-28B-A3B-Ukucinga kwaye kutheni ibalulekile?
Ngaphakathi kwintsapho ye-ERNIE 4.5, i-VL-28B-A3B-Thinking version ibekwe njengomzekelo. ilula kodwa inebhongo ekuqiqeni ngeendlela ezininzi. Isebenzisa uyilo lwe-MoE eneparameters ezibhiliyoni ezingama-28.000 kunye ne-asethi ye-asethi yeebhiliyoni ezi-3.000 ngethokheni nganye, inciphisa iindleko zentelekelelo ngelixa igcina ukusebenza okukhuphisanayo kwiimodeli ezinkulu, ezixineneyo.
Inkcazo yayo yobugcisa ikhankanya ukuya kwiingcali ze-130 kunye ne-14 esebenzayo kwinqanaba ngalinye, ulungelelwaniso oluhambelana nenjongo ekhethekileyo ngohlobo lokufaka, ukulawula ukusetyenziswa kwamandla kunye ne-latency. Ingcamango kukuba i-router ikhetha "iingcali ezifanelekileyo" xa imifanekiso, isicatshulwa, okanye indibaniselwano yazo zombini ifunyenwe, ukwandisa ukusebenza kakuhle. iyantlukwano yokumelwa kunye nempumelelo yokubala.
Kwicandelo elibonakalayo, i-backbone yi-Vision Transformer (i-ViT) eqhekeza umfanekiso kwiipatches kwaye iphathe njengemiqondiso. Olu qikelelo kwisithuba sokuzinzisa esifanayo njengoko isicatshulwa siququzelela “ingxoxo” elulwelo phakathi kweendlela, ezixhaswa bubuchule boqeqesho obufana nokulahleka kwe-orthogonal umzila (ukuze iingcaphephe zingasebenzisani ngokugqithisileyo) kunye a uphawu-balanced ilahleko multimodal ethintela enye indlela ekusibekeleni enye.
Kunye nethegi ethi "UkuCinga", i-Baidu iqhayisa ngophuculo olukhulu ekuqiqeni okubonakalayo, uhlalutyo lwegrafu, i-causality, isiseko, kunye nokulandela imiyalelo ebonakalayo. Ngaphaya koko, ukukwazi ukubiza izixhobo kunye nokwenza iziphumo... yakhiwe kwi-JSON Kwaye ukuba nokumodareyitha okudityanisiweyo komxholo kuyenza ibe sisiqwenga esiqinileyo kwiiarhente ezininzi.

Uyilo, uqeqesho kunye nezakhono: yintoni ezisa yona
Ifilosofi ye-MoE ivumela kuphela iqhezu leeparamitha ukuba zisebenze ngophawu ngalunye, oluguqulela ku. ukubala kakuhle ngaphandle kokuncama isikali esipheleleyo semodeli. "Ingcali" nganye inokugxila kwiipateni okanye kwimisebenzi (umzekelo, i-OCR, imizobo, ukuqiqa kwamanani), kwaye i-router ifunda ukudibanisa ngokuhambelana nomxholo.
Ngokwenza oku, oku kuqiniswa ngeengcamango ezimbini eziphambili zoqeqesho: ilahleko ye-orthogonal ye-router-ekhuthaza ukuhlukahluka phakathi kweengcali-kunye ne-token-balanced multimodal loss loss function, egcina ibhalansi phakathi kwesicatshulwa kunye nomfanekiso ngexesha loqeqesho. Oku kuthintela imodeli ekusebenzeni kakuhle ngokungaqhelekanga ngokubhaliweyo kodwa kunzima nombono (okanye ngokuphendululekileyo). Kwi-VL-28B-A3B-Thinking, ngaphezu koko, uqeqesho oluphakathi oluzinikele ekuqiqeni malunga nomfanekiso-ubhalo lwezibini kwandisa amandla okumela kwaye lukhuni Ulungelelwaniso lwe-multimodal semantic.
Ngokumalunga nemilinganiselo, uhlalutyo oluzimeleyo oluthelekisayo (umzekelo, i-Galaxy.AI) ibeka i-ERNIE-4.5-VL-28B-A3B ngokuhambelana-okanye nokugqithisa-iindlela ezifana ne-Qwen2.5-VL-7B kunye ne-Qwen2.5-VL-32B kwingqiqo ebonakalayo, i-multimodal yokuqiqa, i-document compredal. Oku kulungelelaniswa nomzobo omncinci wokuthengisa (ewe, kunzima kakhulu ukufundeka) ocebisa ukuba ihambe ngesantya okanye igqithise obunzima obufana neGemini 2.5 Pro okanye "phezulu" GPT-5. Abanye abarhanelwa ukulinganisa, kodwa inyani yile, kunye nezixhasi zokuqinisa (GSPO, IcePop) kunye nesampulu yobunzima obuguqukayo, kuyaqondakala ukuba imodeli iphucukile. ukomelela kwimisebenzi engqinisisekayo.
Umsebenzi othi "UkuCinga ngeMifanekiso" ufanelwe ukukhankanywa ngokukodwa: akusiyo umlingo, kodwa ukuhamba komsebenzi odibanisa ukusondeza umfanekiso kunye nezixhobo zokukhangela ezibonakalayo ukuze ubambe iinkcukacha ezintle kakhulu (iipleyiti, iimpawu ezincinci, i-iconography) kunye nokufikelela kulwazi olude lomsila xa ulwazi lwangaphakathi lungonelanga. Esi sikhundla, kunye nesiseko esifikelelekayo (ukwenza imisebenzi yokumisela isebenze ngemiyalelo elula), yenza imodeli ibe ngumgqatswa owomeleleyo izicelo mveliso kunye neemeko ezinemifanekiso enzima.
Kwiindawo ezineelwimi ezininzi, uchungechunge lwe-ERNIE 4.5 lugcina ukusebenza okuphezulu ngaphandle kokuncama ukuqonda okubonakalayo, into ephambili kwimisebenzi yehlabathi. Ngapha koko, imveliso ecwangcisiweyo (JSON) kunye neefowuni zomsebenzi zivula umnyango wokusebenzisa iimeko apho imodeli ingajongi kuphela kwaye iphendule, kodwa ... isebenza kwizixhobo (umzekelo, ukukhangela izinto kunye nokubuyisela iibhokisi zazo zokubopha kunye nezilungelelaniso).
Iimeko zokusetyenziswa eziqinisekisiweyo
Ukuqiqa okubonakalayo kwiitshathi ezixineneyo: imodeli inokuwela imihla yokubhekisela kunye neentsuku zeveki, itolike isakhiwo setshathi, ibone amaxesha obuninzi obuphantsi (umzekelo, i-12: 00-14: 00), kwaye ivelise isincomo esicacileyo samaxesha angcono okutyelela. Apha, sibona ukuqiqa nge amanyathelo amaninzi edibanisa ikhalenda, ukufunda okubonwayo kunye nengqiqo.
Iingxaki ze-STEM ezivela kwiifoto: Ujongene nesekethe yebhulorho engenakusombululwa yi-series-parallel elula, imodeli isebenzisa iMithetho ye-Ohm kunye ne-Kirchhoff, ibeka i-equations ye-node, kwaye ifumana umphumo ochanekileyo wokuhlalutya (umzekelo, R = 7/5 Ω). Oku kubonisa ukukwazi kwayo ukufunda imizobo ngokobugcisa kunye ukuqiqa okufuziselayo.
Isiseko esibonakalayo esinemveliso ecwangcisiweyo: "Chonga bonke abantu abanxibe iisuti kwaye babuyisele iibhokisi zabo zokubopha kwi-JSON", ibona abantu kwaye ihambise ulungelelwaniso lwamanani oluchanekileyo. Undoqo kukudibanisa umhlaba kunye landela imiyalelo kunye nefomathi yemveliso ecwangcisekileyo.
"Ukucinga ngemifanekiso" ye-OCR eneenkcukacha: ukuba umsebenzisi ucela okubhaliweyo kuphawu oluluhlaza ngasemva, isixhobo sokusondeza siyakhaba, sivumela ukuchongwa kweelebhile ezincinci (ezifana ne "HOTEL BUZA") ngeenkcukacha ezithe kratya. ukuthembekaIngumzekelo we ingqwalasela enamandla kwiindawo ezintle.
Ukusetyenziswa kwezixhobo zolwazi ezinomsila omde: Ujongene nento yokudlala engqukuva emthubi, imodeli ithatha isigqibo sokucela ukhangelo lomfanekiso wangaphandle, ithelekisa iimpawu, kwaye igqibe kwelokuba yi "Dundun," ehambelana neMINISO. Lo mbhobho ubonisa yayo umthamo weokhestra yamanyathelo anezixhobo.
Uxinzelelo lwevidiyo: izicatshulwa imibhalo engezantsi ngezitampu zexesha kwaye ibeka imiboniso ethile (umzekelo, amacandelo ajikeleze i-17s, 37s, kunye ne-47s efotwe kwibhulorho). Apha ixuba utsalo lokubhaliweyo, ukuqiqa okwexeshana, kunye uhlalutyo lwesithuba Umxholo.
Olunye uhlobo oluphawulekayo: ERNIE‑4.5‑21B‑A3B‑Ukucinga
Ecaleni kwe-VL-28B edition, kukho ukuhluka okugxininise kwisicatshulwa / ikhowudi yokuqiqa kunye nenani elipheleleyo lamathokheni angama-21B kunye ne-3B esebenzayo ithokheni nganye. Yenziwe ngombono "wobuchule, hayi mkhulu," ebonisa ukusebenza okumangalisayo kwingqiqo, imathematika, inkqubo namatyathanga amade okuqiqa. Ipapashwe ngaphantsi Apache-2.0 Kwaye ngefestile yomxholo owandisiweyo (kwi-128K-131K uluhlu), inomtsalane kakhulu kwimisebenzi yefomathi ende kunye nohlalutyo oluthelekisayo lwamaxwebhu amaninzi.
Enye yeengongoma zayo zokuthengisa ixabiso: imirhumo ebonisayo iye yapapashwa ngokusebenzisa amaqonga athile kunye neendleko ezinobundlobongela kakhulu kwiithokheni zezigidi (umzekelo, i-$ 0,07 yokungena kunye ne-$ 0,28 yokuphuma, kunye ne "$ 0 / $ 0" kwezinye i-21B uqwalaselo), nangona kuyacetyiswa ukuba kuqinisekiswe ukufumaneka kwangempela kunye neemeko, ngenxa yokuba i-ecosystem kunye ne-deploy izivumelwano zorhwebo zinokwahluka.
Ukuthelekiswa kweemarike kunye nengxolo
Ngokumalunga negrafu encinci edumileyo ethelekisa neGemini 2.5 Pro kunye ne "high" GPT-5: intengiso, hayi uphicotho oluzimeleyo. Nangona kunjalo, xa kuthelekiswa neebhetri ezifumaneka esidlangalaleni (Qwen2.5-VL-7B / 32B, njl.), imodeli ibamba ngokwayo. Njengesiqhelo, kungcono ukuyivavanya kwidatha yakho ekujoliswe kuyo kunye neemetrics, kuba i ngokubanzi Iyahluka ngokuxhomekeka kwi-domain, umgangatho we-prompts, izixhobo ezikhoyo, kunye nokuxuba izinto ezifakiweyo (umbhalo / umfanekiso / ividiyo).
Ubungakanani kunye neemfuno zememori
Ekuhanjisweni kwendawo, i-quantization iyanceda. NgeFP16, kuqikelelwa ukuba ibe malunga ne- ~ 56 GB ye-VRAM; nge 4-bit, malunga ~ 14 GB; kunye ne-2-bit, ~ 7 GB. Qaphela: la manani axhomekeke kwixesha kunye nokupakishwa. Ngokomzekelo, ezinye izikhokelo ze-FastDeploy zikhankanya ubuncinci be-24 GB ngekhadi ngalinye, nakweminye imimandla (umzekelo, i-vLLM efunwa kakhulu) i-80 GB ikhankanywe kwimimiselo ethile. Kuxhomekeke kwisitaki (PaddlePaddle, PyTorch, kernels, ubude bolandelelwano(, ibhetshi, i-KV cache), umzobo osebenzayo unokuhamba.
Inkxaso kunye nokumodareyithwa kweelwimi ezininzi
Inkxaso yeelwimi ezininzi ngaphandle kokuncama ukubonakala yenye amandla. Kwaye kwimveliso ejongene nomsebenzisi, ukumodareyitha eyakhelwe-ngaphakathi yongeza umaleko wokhuseleko onciphisa umngcipheko wokuthunyelwa. Iziphumo ezicwangcisiweyo kunye neefowuni zokusebenza zivumela imodeli ukuba ihlanganiswe njenge "injini" ngaphakathi kwemibhobho nge izixhobo zangaphandlehayi nje nge-chatbot.
Umzekelo ogqithisileyo wokuqonda okubhaliweyo
Imodeli inokusingatha ukubhalwa kwembali eyinkimbinkimbi, njengemibhalo emalunga ne "Five Kings of Wō" kwimithombo yaseTshayina, ireferensi enqamlezileyo evela "kwiNcwadi yeeNgoma," imibhalo ebhalwe kwiGwanggaeto Stele, okanye imibhalo engezantsi kunye neminyaka (umzekelo, 478) kunye neendawo (Ji'an, Jilin). Olu hlobo lwegalelo ludibanisa iinguqulelo, amanqaku engcaciso, kunye nomxholo we-archaeological (induli zokungcwaba, amakrele anemibhalo efana ne "Daio" ehambelana neBu/Yūryaku). Inkqubo efana ne-ERNIE-4.5-VL-28B-Ukucinga inokwahlula le nto, ibone amagama afanelekileyo (Yomi, Mí, Sei, Ō, Bu), kwaye iwadibanise amanani asebukhosini IJapan kwaye ichaze isishwankathelo esihambelanayo kunye neenyani: imbeko kumakhosi aseTshayina asemazantsi, ingxabano kwi-peninsula yaseKorea, isiseko eKara / Imna kwimithombo yentsimbi, njl.
Ukuphunyezwa, ukufikelela kunye nemibuzo ebuzwa rhoqo
Kukho iindlela ezininzi zokuvavanya kunye nokusebenzisa i-ERNIE 4.5. I-Baidu inikezela ngofikelelo lwewebhu ukuze uqalise ngaphandle kofakelo. Ukudibanisa kunye neeplatifomu zomntu wesithathu (umzekelo, i-Novita API Playground) yenza kube lula ukuvavanya imodeli kwiindawo zophuhliso kunye neendleko zokulinganisa. Ukusasazwa kwendawo, isitaki esicetyiswayo siqhelekile... LinuxngePaddlePaddle (ERNIEKit) kunye nokuhambelana okunqamlezayo kunye neTransformers kwiPyTorch usebenzisa trust_remote_code xa ichukumisa.

Ukusasazwa ngeeTransformers (PyTorch)
Indlela eqhelekileyo ibandakanya ukulayisha imodeli nge-AutoModelForCausalLM, ukongeza umfanekiso wangaphambili kwi-AutoProcessor, kunye nokwakha imiyalezo ye-multimodal edibanisa umbhalo kunye nomfanekiso / ividiyo. Emva koko, iveliswa ngemida yethokheni efanelekileyo kwaye isiphumo sicaciswa. Undoqo kukuba Projekthi lawula zombini itemplate yengxoxo kunye nokulungiswa kwamaxesha okubonwayo.
<!-- Ejemplo orientativo (parafraseado) -->
from transformers import AutoModelForCausalLM, AutoProcessor
import torch
name = "baidu/ERNIE-4.5-VL-28B-A3B-Thinking"
model = AutoModelForCausalLM.from_pretrained(
name, device_map="auto", dtype=torch.bfloat16, trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(name, trust_remote_code=True)
model.add_image_preprocess(processor)
messages = [{
"role": "user",
"content": [
{"type": "text", "text": "¿De qué color es la ropa de la chica?"},
{"type": "image_url", "image_url": {"url": "https://.../example1.jpg"}}
]
}]
text = processor.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = processor.process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt")
out_ids = model.generate(**{k: v.to(model.device) for k, v in inputs.items()}, max_new_tokens=256)
print(processor.decode(out_ids[0][len(inputs["input_ids"][0]):]))
Ukufikelela kwi-vLLM
I-vLLM ikhawulezisa intelekelelo kwaye yongeza iinketho ezinjengeezahluli ezilungiselelwe ngokukodwa ukuqiqa kunye neefowuni zesixhobo. Khumbula ukuyivula. -trust-remote-code xa ukhonza imodeli ukuba indawo yokugcina iyayifuna.
# Instalar nightly (orientativo)
pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly
# Servir el modelo
vllm serve baidu/ERNIE-4.5-VL-28B-A3B-Thinking --trust-remote-code
# Con parsers de razonamiento y herramientas
evllm serve baidu/ERNIE-4.5-VL-28B-A3B-Thinking \
--trust-remote-code \
--reasoning-parser ernie45 \
--tool-call-parser ernie45 \
--enable-auto-tool-choice
FastDeploy kunye ne-ERNIEKit
I-FastDeploy ivumela ukuveza iinkonzo ezikhawulezayo kunye neeparitha zokulawula ubude obuninzi, inani lolandelelwano, ubungakanani (wint8 / INT4), ii-parsers zokuqiqa, kunye nezicwangciso zeprosesa ye-multimodal (umzekelo, umfanekiso_max_pixels). Iimfuno ezikhankanyiweyo zeVRAM ziyahluka; Amagqabantshintshi enziwe ukususela ngoko 24 GB ikhadi ngalinye ukuya kwiimeko ezifuna i-80 GB kwezinye izikhokelo; kuxhomekeke kumdibaniso wemodeli, ukuchaneka, ibhetshi kunye nobude.
# Ejemplo orientativo
fastdeploy serve --model baidu/ERNIE-4.5-VL-28B-A3B-Thinking \
--max-model-len 131072 \
--max-num-seqs 32 \
--port 8180 \
--quantization wint8 \
--reasoning-parser ernie-45-vl-thinking \
--tool-call-parser ernie-45-vl-thinking \
--mm-processor-kwargs '{"image_max_pixels": 12845056 }'
Ukulungiswa kakuhle (SFT/LoRA) kunye nolungelelwaniso (DPO)
I-ERNIEKit, esekwe kwi-PaddlePaddle, ibonelela ngolungelelwaniso osele lulungile lwe-SFT kunye nangaphandle kwe-LoRA, kunye ne-DPO. Kuluncedo ukulungelelanisa imodeli kwiindawo ezithile (umzekelo, amaxwebhu oshishino, ukuhlolwa okubonakalayo, iifom) ngelixa ugcina ukuqina kwe-multimodalUnokukhuphela indawo yokugcina imodeli kwaye usebenzise iitemplates zoqeqesho ezibandakanyiweyo kwimizekelo yezixhobo zokusebenza.
Ukufikelela nge-APIs kunye namaqonga
Ukongeza kwiqonga le-Baidu, kukho ukudibanisa okuhambelana nomgangatho. I-OpenAI APIs. Oku kwenza kube lula ukufuduka kwizixhobo ezikhoyo (umzekelo, abathengi bomgca womyalelo okanye abahleli abafana neCwangciso) ngokunqanda isidingo sokwenza kwakhona ukudityaniswa. Amanye amafu e-GPU (njenge-Novita AI) abhengeza iimeko ezine-VRAM eyaneleyo kunye namaxabiso eyure, kunye nokulinganisa kwii-GPU ezininzi, eziluncedo ukuba uyafuna. vavanya ubumbeko olukhulu ngaphandle kokutyala imali hardware ngokwayo.
Ilayisensi yoRhwebo kunye nokuSetyenziswa
Intsapho ye-ERNIE 4.5 ikhutshwe phantsi kwe-Apache 2.0, ilayisenisi yokuvumela evumela ukusetyenziswa kwezorhwebo ngelixa ihlonipha imigaqo kunye nezaziso. Oku kwenza kube lula ukwenza iimveliso ezihlawulelwayo ngokudibanisa imodeli kunye nezinto eziphuma kuyo, ukuba nje ugcina i ukuthotyelwa kwelayisensi kunye nophawu oluhambelanayo (umzekelo, ukucaphula ingxelo yobugcisa).
Amaxabiso kunye nomxholo
Iimbekiselo zamaxabiso ezikhuphisana kakhulu kwabelwana ngazo. Ngokomzekelo, kwi-300B i-A47B edition, umxholo okhankanywe ngu-123k, kunye neendleko ezibonisa i-$ 0,28 / M igalelo kunye ne-$ 1,10 / M imveliso; ye-21B A3B, amanani apapashiweyo asezantsi njenge $0/$0 abonwe. Kucetyiswa ukuba ujonge ukufumaneka kunye neemeko ezichanekileyo kwiqonga elifanelekileyo, njengoko amaxabiso axhomekeke kumnikezeli. umrhumo wokusebenzisa, ummandla kunye ne-SLA.
Ukusebenza kwimisebenzi yenyani yobomi
Ngaphandle kwephepha, into enomdla apho ikhanya khona: amaxwebhu okufunda ngomxube wesicatshulwa kunye nezinto ezibonakalayo (izitampu, iitafile, iisiginitsha), ukukhupha idatha kunye ne-grounding (ulungelelaniso), ukuxazulula iingxaki ze-STEM kwiifoto okanye kwiibhodi ezimhlophe, izishwankathelo zevidiyo ezinendawo yesikhashana yeziganeko, kunye ukusetyenziswa kwesixhobo Kulwazi olude lomsila. Ukuba isicelo sakho sihambelana neprofayile, "Ukucinga" yongeza iziqwenga eziluncedo.
I-FAQ eKhawulezayo
- Kuthetha ukuthini "UkuCinga NgeMifanekiso"? -Kukuhamba komsebenzi okudibanisa ukusondeza kunye nokukhangela okubonakalayo ukuze ubambe iinkcukacha kunye nokubonisana nolwazi lwangaphandle xa ulwazi lwangaphakathi lungonelanga, ukuphucula ukuqiqa kakuhle.
- Ingakanani iVRAM endiyifunayo? - Kuyaxhomekeka. Njengesikhokelo esibi: FP16 ~ 56 GB; INT4 ~14 GB; I-2-bit ~ 7 GB. Kodwa ixesha lokubaleka kunye nobungakanani bomxholo bunokuphakamisa ibha, ngakumbi nge vLLM.
- Ngaba idibanisa nezixhobo? - Ewe, ixhasa iifowuni zokusebenza kunye nemveliso ye-JSON, ivumela i-multimodal agents kunye nesiseko, i-OCR, ukukhangela, njl., ikhonkco kunye amanyathelo angqinisisekayo.
- Ngaba kukho enye indlela eyomeleleyo "yesicatshulwa-kuphela"? -ERNIE-4.5‑21B-A3B-Ukucinga kuyagqwesa kwingqiqo, izibalo, kunye nekhowudi, ngomlinganiselo olungileyo iindleko-ukusebenza kunye nomxholo obanzi.
Ukuba ujonge imodeli ye-multimodal elinganisa ukusebenza kakuhle kunye nomthamo, i-ERNIE-4.5-VL-28B-A3B-Ukucinga inomdla kakhulu. Iintsika zayo ziyi-MoE elungiswe kakuhle (iingcali ze-130 kunye nabasebenzisi abasebenzayo be-14), i-ViT edibene nendawo yombhalo ekwabelwana ngayo, ukulahleka kwe-router ye-orthogonal, kunye nokulahleka kwe-token-balanced multimodal, iqiniswe ngokuqiqa phakathi koqeqesho, i-RL kunye ne-GSPO / IcePop, kunye "nokucinga kwimifanekiso." Iidemos zayo zibonisa ukuqiqa okubonakalayo Amanyathelo amaninzi, isiseko esichanekileyo, i-STEM esuka kwiifoto, ukusetyenziswa kwesixhobo, kunye nokuqonda ixesha lokuqonda ividiyo. Ukufikelela kwi-Flexible (i-Baidu, i-APIs ehambelanayo, ukuthunyelwa kwendawo kunye ne-Paddle / Transformers), ilayisenisi ye-Apache 2.0, kunye nezinketho ze-quantization zizalise iphakheji, ukuthengisa ngaphandle, kunesiseko sobugcisa ukukhuphisana kakuhle kakhulu.
Umbhali onomdla malunga nehlabathi le-bytes kunye netekhnoloji ngokubanzi. Ndiyakuthanda ukwabelana ngolwazi lwam ngokubhala, kwaye yile nto ndiza kuyenza kule bhlog, ndikubonise zonke izinto ezinomdla malunga nezixhobo, isoftware, ihardware, iindlela zetekhnoloji, kunye nokunye. Injongo yam kukukunceda uhambe kwihlabathi ledijithali ngendlela elula neyonwabisayo.
