一战到底 chenzhicheng
  • Joined on Nov 13, 2023
  • Organization
Loading Heatmap…

chenzhicheng synced commits to pull-request/5143 at chenzhicheng/Megatron-LM from mirror

  • 2948edef62 refactor: Update ENABLE_LIGHTWEIGHT_MODE handling in run_ci_test.sh Replaced the direct usage of ENABLE_LIGHTWEIGHT_MODE with a new approach that checks both the environment variable and the configuration file for its value. This change ensures that the lightweight mode is set correctly based on the provided parameters.

33 minutes ago

chenzhicheng synced commits to pull-request/4967 at chenzhicheng/Megatron-LM from mirror

  • 0906db0c4d fix formating Signed-off-by: Shiqing Fan <shiqingf@nvidia.com>
  • 9dca05a9a8 Fix GTP DDP bucket alignment for distributed optimizer; add corresponding UT Signed-off-by: Shiqing Fan <shiqingf@nvidia.com>
  • 107c0772b8 fix UTs Signed-off-by: Shiqing Fan <shiqingf@nvidia.com>
  • Compare 3 commits »

33 minutes ago

chenzhicheng synced commits to pull-request/4832 at chenzhicheng/Megatron-LM from mirror

  • d1b1e9930f fix ut Signed-off-by: xiaoyao0115 <1804647152@qq.com>

33 minutes ago

chenzhicheng synced commits to pull-request/4329 at chenzhicheng/Megatron-LM from mirror

  • ff60510bb3 Use hybrid Mamba MoE model in FSDP E2E tests

33 minutes ago

chenzhicheng synced commits to v4 at chenzhicheng/facefusion from mirror

40 minutes ago

chenzhicheng synced commits to main at chenzhicheng/ms-swift from mirror

56 minutes ago

chenzhicheng synced commits to update_imgs at chenzhicheng/xtuner from mirror

2 hours ago

chenzhicheng synced commits to offline-cache at chenzhicheng/xtuner from mirror

  • ceabe74215 [Fix] adapt clusterx brainpp breaking change

2 hours ago

chenzhicheng synced commits to main at chenzhicheng/xtuner from mirror

  • 0e40b0d7ba [Refactor] Refactor RL unittest part 1: add pr-fast unit test (#1865) * Add RL PR-fast test suite * Add RL PR-fast risk tests 新增 _prepare_train_data contract 测试,覆盖文本、VLM、invalid group 和 fail-fast 路径。 新增 SingleTurnAgentLoop batch judge/pause 测试,并收紧 batch judge 仅全组 COMPLETED 才触发。 新增 RolloutController fake routed 分支和 CPUResourceManager register/validate_or_raise PR-fast 覆盖。 * Remove duplicated RL tests migrated to PR-fast 删除 tests/rl 根目录下已经迁入 tests/rl/fast/pr_fast 的旧单测入口。 覆盖同名 PR-fast 测试,以及已合并到 test_rollout_logic.py 的 rollout worker/utils 旧测试。 * delete useless rollout_output.jsonl * fix ci * fix claude comments * fix ci

2 hours ago

chenzhicheng synced commits to gh-pages at chenzhicheng/xtuner from mirror

2 hours ago

chenzhicheng synced commits to agentic_branch at chenzhicheng/xtuner from mirror

  • 9a1190379c fix lint and enable routed API proxy (#1882) * fix lint and enable routed API proxy * fix doc

2 hours ago

chenzhicheng synced commits to v5.10-release at chenzhicheng/transformers from mirror

6 hours ago

chenzhicheng synced commits to update-docs at chenzhicheng/transformers from mirror

  • 95d70e2f45 address PR review
  • b1879ecb33 Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
  • 8970c1b6f6 Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
  • 2a2d08dcd5 Apply suggestion from @stevhliu Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
  • 8725d0d86d Apply suggestion from @stevhliu Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
  • Compare 16 commits »

6 hours ago

chenzhicheng synced commits to parakeet-rnnt-from-scratch at chenzhicheng/transformers from mirror

6 hours ago

chenzhicheng synced commits to nemotron-h-split-dense-sparse at chenzhicheng/transformers from mirror

  • 84a2be05c0 nemotron_h_sparse: support optional latent MoE projection Add `moe_latent_size`: when set, the routed experts run in a latent space (down-proj before, up-proj after; shared expert stays at hidden_size). Only Nemotron-3 Nano (A3B) leaves it unset — Super and Ultra both use it, so it lives on the shared sparse block. `None` is a no-op (Identity).

6 hours ago

chenzhicheng synced commits to main at chenzhicheng/transformers from mirror

  • effde20942 extend deepseek v4 test to xpu (#46366) * extend deepseek v4 test to xpu Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * update Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * update Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * make require_cuda_capability_at_least composable instead of incremental --------- Signed-off-by: Wang, Yi <yi.a.wang@intel.com> Co-authored-by: Ilyas Moutawwakil <ilyas@huggingface.co>
  • a921b4d886 Quantization for small models (#46449) * Quantization for small models Co-authored-by: Marc Sun <marc.sun@hotmail.fr> Co-authored-by: Phil Culliton <phillipculliton@gmail.com> Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * fix CI * let's ignore for now * ignore docs for now, later todo --------- Co-authored-by: Marc Sun <marc.sun@hotmail.fr> Co-authored-by: Phil Culliton <phillipculliton@gmail.com> Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> Co-authored-by: vasqu <antonprogamer@gmail.com>
  • f677e3a653 Added cosmos3 model (#46146) * Added cosmos3 model Signed-off-by: Maciej Bala <mbala@nvidia.com> * bugfix for processing_utils.py Signed-off-by: Maciej Bala <mbala@nvidia.com> * Add extra tests * Rename architecture; linter fixes Signed-off-by: Maciej Bala <mbala@nvidia.com> * Rename the architecture again for now Signed-off-by: Maciej Bala <mbala@nvidia.com> * Adapted to new Cosmos3 checkpoint format * Improve the checkpoint loading Signed-off-by: Maciej Bala <mbala@nvidia.com> * Simplified docs Signed-off-by: Maciej Bala <mbala@nvidia.com> * Renamed cosmos3 to cosmos3_reasoner Signed-off-by: Maciej Bala <mbala@nvidia.com> * Removed cosmos3 tests Signed-off-by: Maciej Bala <mbala@nvidia.com> * Revert processing fix Signed-off-by: Maciej Bala <mbala@nvidia.com> * Answered review comments Signed-off-by: Maciej Bala <mbala@nvidia.com> * Rename to Cosmos3ReasonerForConditionalGeneration Signed-off-by: Maciej Bala <mbala@nvidia.com> * Renamed config key Signed-off-by: Maciej Bala <mbala@nvidia.com> * Run make fix-repo Signed-off-by: Maciej Bala <mbala@nvidia.com> * Refactored naming Signed-off-by: Maciej Bala <mbala@nvidia.com> * Some more renaming; fix linter issues; Add Cosmos3 Processor Signed-off-by: MaciejBalaNV <mbala@nvidia.com> * Linter fixes Signed-off-by: MaciejBalaNV <mbala@nvidia.com> * Multiple fixes Signed-off-by: Maciej Bala <mbala@nvidia.com> * fix test and linter Signed-off-by: Maciej Bala <mbala@nvidia.com> * Make fix-repo fixes Signed-off-by: Maciej Bala <mbala@nvidia.com> * Linter fix Signed-off-by: Maciej Bala <mbala@nvidia.com> * Change date Signed-off-by: Maciej Bala <mbala@nvidia.com> * rename it all and load existing AutoModel * docs * fix repo * fix docs * ugly typo * adjust expectation * dont sample in test! * forgot this one * style --------- Signed-off-by: Maciej Bala <mbala@nvidia.com> Signed-off-by: MaciejBalaNV <mbala@nvidia.com> Co-authored-by: raushan <raushan@huggingface.co>
  • b1ac534932 ci: fix slack report job — use AWS runner and authenticated GitHub API calls (#46448) * Use group: aws-general-8-plus * Fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
  • dcf0c7c8c3 fbgemm_fp8:Keep the current device aligned with the input tensor (#46403) * fbgemm_fp8:Keep the current device aligned with the input tensor Signed-off-by: kaixuanliu <kaixuan.liu@intel.com> * update comment Signed-off-by: kaixuanliu <kaixuan.liu@intel.com> * move `on_device` func to common part Signed-off-by: kaixuanliu <kaixuan.liu@intel.com> * update Signed-off-by: kaixuanliu <kaixuan.liu@intel.com> * update code Signed-off-by: kaixuanliu <kaixuan.liu@intel.com> --------- Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>
  • Compare 27 commits »

6 hours ago

chenzhicheng synced commits to improve_dinos at chenzhicheng/transformers from mirror

  • 1e29c49fcf fall back to usual hidden states routing for now
  • b512098c9e fixup merge
  • 6a0a80d97b Merge branch 'main' into improve_dinos
  • d3f05911ab Fix convert_tokens_to_ids performance regression for slow tokenizers (#46315) (#46323) * Fix convert_tokens_to_ids performance regression for slow tokenizers (#46315) Slow (PreTrainedTokenizer) tokenizers resolved added-vocabulary tokens through the added_tokens_encoder property, which rebuilds and re-sorts the whole added-token mapping on every access and is read twice per token. That made convert_tokens_to_ids roughly O(T * N * log N) for N added tokens, a regression from the v5 tokenizer refactor (#40936). Read the maintained _added_tokens_encoder cache instead, restoring the v4 behavior that every other method in the file already relies on. Adds a network-free regression test using CTRLTokenizer. * Remove regression test per reviewer feedback
  • ad0323529a [docs] Romanian translation of `fast_tokenizers.md`, `custom_tokenizers.md`, `tokenizer_summary.md`, `image_processors.md` and `video_processors.md`. (#46356) * added fast_tokenizers.md * added custom_tokenizers.md * added tokenizer_summary.md * added image_processors.md * added video_processors.md * small fixes * added backbones.md * added feature_extractors.md * added processors.md; closed the preprocessors section
  • Compare 75 commits »

6 hours ago

chenzhicheng synced commits to hf-exporters at chenzhicheng/transformers from mirror

6 hours ago

chenzhicheng synced commits to gguf-matmul-kernels at chenzhicheng/transformers from mirror

  • e0aae61bf9 GGUF: uniform meta-time swap, no rename, bind kernels post-load Replace the rename-based swap plan with a uniform meta-time swap and post-load binding. The gguf_name -> hf_name mapping is no longer computed twice. - GgufLinear / GgufExperts take an optional quant_type: the swap creates uint8 placeholders sized from the Q4_K_M average (good enough for device_map memory estimation), and the exact quant type + metal kernels are bound post-load from the bytes that arrive (bind_after_load). Forward stays branch-free (compile). - GGUFQuantizedTensor.__torch_function__ now propagates quant_type through every subclass-preserving op (the loader's .to, the Q/K permute, the gate/up cat) so bind_after_load can read it back; these subclasses only exist during load. - replace_with_gguf_linear is a uniform swap (modules_to_not_convert + tied output embeddings skipped); the on-the-fly path passes its single config quant_type. load_checkpoint_state drops per-tensor work: a coarse metadata-only "all kernel-supported?" check gates the byte path vs full dequant. - postprocess_model reconciles: bind each module's kernels (from the bytes, or module_quant_types on reload), revert float-source Linears (MoE routers) to nn.Linear, and record module_quant_types for the save round-trip. It also drops the GGUF load-renames so save_pretrained writes HF names (the swapped model is now standard safetensors, re-swapped on reload) — fixes the round-trip. Validated on MPS: dense (qwen2) and MoE (qwen2moe: router revert + experts) swap generation, gemma2/nemotron/gemma3 de-offset conversion, on-the-fly swap, dequant-on-load, and save_pretrained -> from_pretrained round-trip; 64 fast tests.
  • e09ea8793b GGUF: expose header metadata without materializing tensors load_gguf_checkpoint now computes `tensor_quant_types` ({name: quant_type}) and `weight_mapping` unconditionally — they are read straight off the GGUF header (no tensor data), so a `return_tensors=False` call returns them cheaply. Only the eager `np.copy` of tensor bytes stays behind `return_tensors=True`. This lets the module-swap plan be built from metadata + renamings alone (pure name resolution, no tensor load / no conversion). Verified: return_tensors=False yields 291 quant types + 12 rules with no `tensors`; full load and AutoConfig via gguf unchanged; 63 fast tests pass.
  • b11c4d7a96 GGUF: de-offset norms via keep-in-fp32 instead of pre-applying SubtractOne is a real fp32 op again. Rather than pre-subtracting on the fp32 source in load_checkpoint_state, the quantizer forces the `w + 1` norms to stay fp32 for the load via `_keep_in_fp32_modules_strict` (covers fp16 and bf16), keyed off the SubtractOne converter targets in weight_mapping — no arch list. The converter then subtracts in fp32 and the result is exact. Those norms end up fp32 on the GGUF load, so the weights-conversion tests compare at the original dtype. Validated on MPS: nemotron(fp16)/gemma2(fp16)/ gemma3_text(bf16) conversion + stablelm control + qwen2 generation.
  • Compare 3 commits »

6 hours ago

chenzhicheng synced commits to finegrained-fp8-v2 at chenzhicheng/transformers from mirror

  • beb0605168 Merge branch 'main' into finegrained-fp8-v2
  • effde20942 extend deepseek v4 test to xpu (#46366) * extend deepseek v4 test to xpu Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * update Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * update Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * make require_cuda_capability_at_least composable instead of incremental --------- Signed-off-by: Wang, Yi <yi.a.wang@intel.com> Co-authored-by: Ilyas Moutawwakil <ilyas@huggingface.co>
  • a921b4d886 Quantization for small models (#46449) * Quantization for small models Co-authored-by: Marc Sun <marc.sun@hotmail.fr> Co-authored-by: Phil Culliton <phillipculliton@gmail.com> Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * fix CI * let's ignore for now * ignore docs for now, later todo --------- Co-authored-by: Marc Sun <marc.sun@hotmail.fr> Co-authored-by: Phil Culliton <phillipculliton@gmail.com> Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> Co-authored-by: vasqu <antonprogamer@gmail.com>
  • f677e3a653 Added cosmos3 model (#46146) * Added cosmos3 model Signed-off-by: Maciej Bala <mbala@nvidia.com> * bugfix for processing_utils.py Signed-off-by: Maciej Bala <mbala@nvidia.com> * Add extra tests * Rename architecture; linter fixes Signed-off-by: Maciej Bala <mbala@nvidia.com> * Rename the architecture again for now Signed-off-by: Maciej Bala <mbala@nvidia.com> * Adapted to new Cosmos3 checkpoint format * Improve the checkpoint loading Signed-off-by: Maciej Bala <mbala@nvidia.com> * Simplified docs Signed-off-by: Maciej Bala <mbala@nvidia.com> * Renamed cosmos3 to cosmos3_reasoner Signed-off-by: Maciej Bala <mbala@nvidia.com> * Removed cosmos3 tests Signed-off-by: Maciej Bala <mbala@nvidia.com> * Revert processing fix Signed-off-by: Maciej Bala <mbala@nvidia.com> * Answered review comments Signed-off-by: Maciej Bala <mbala@nvidia.com> * Rename to Cosmos3ReasonerForConditionalGeneration Signed-off-by: Maciej Bala <mbala@nvidia.com> * Renamed config key Signed-off-by: Maciej Bala <mbala@nvidia.com> * Run make fix-repo Signed-off-by: Maciej Bala <mbala@nvidia.com> * Refactored naming Signed-off-by: Maciej Bala <mbala@nvidia.com> * Some more renaming; fix linter issues; Add Cosmos3 Processor Signed-off-by: MaciejBalaNV <mbala@nvidia.com> * Linter fixes Signed-off-by: MaciejBalaNV <mbala@nvidia.com> * Multiple fixes Signed-off-by: Maciej Bala <mbala@nvidia.com> * fix test and linter Signed-off-by: Maciej Bala <mbala@nvidia.com> * Make fix-repo fixes Signed-off-by: Maciej Bala <mbala@nvidia.com> * Linter fix Signed-off-by: Maciej Bala <mbala@nvidia.com> * Change date Signed-off-by: Maciej Bala <mbala@nvidia.com> * rename it all and load existing AutoModel * docs * fix repo * fix docs * ugly typo * adjust expectation * dont sample in test! * forgot this one * style --------- Signed-off-by: Maciej Bala <mbala@nvidia.com> Signed-off-by: MaciejBalaNV <mbala@nvidia.com> Co-authored-by: raushan <raushan@huggingface.co>
  • b1ac534932 ci: fix slack report job — use AWS runner and authenticated GitHub API calls (#46448) * Use group: aws-general-8-plus * Fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
  • Compare 45 commits »

6 hours ago

一战到底
chenzhicheng
Loading Heatmap…

chenzhicheng synced commits to pull-request/5143 at chenzhicheng/Megatron-LM from mirror

  • 2948edef62 refactor: Update ENABLE_LIGHTWEIGHT_MODE handling in run_ci_test.sh Replaced the direct usage of ENABLE_LIGHTWEIGHT_MODE with a new approach that checks both the environment variable and the configuration file for its value. This change ensures that the lightweight mode is set correctly based on the provided parameters.

33 minutes ago

chenzhicheng synced commits to pull-request/4967 at chenzhicheng/Megatron-LM from mirror

  • 0906db0c4d fix formating Signed-off-by: Shiqing Fan <shiqingf@nvidia.com>
  • 9dca05a9a8 Fix GTP DDP bucket alignment for distributed optimizer; add corresponding UT Signed-off-by: Shiqing Fan <shiqingf@nvidia.com>
  • 107c0772b8 fix UTs Signed-off-by: Shiqing Fan <shiqingf@nvidia.com>
  • Compare 3 commits »

33 minutes ago

chenzhicheng synced commits to pull-request/4832 at chenzhicheng/Megatron-LM from mirror

  • d1b1e9930f fix ut Signed-off-by: xiaoyao0115 <1804647152@qq.com>

33 minutes ago

chenzhicheng synced commits to pull-request/4329 at chenzhicheng/Megatron-LM from mirror

  • ff60510bb3 Use hybrid Mamba MoE model in FSDP E2E tests

33 minutes ago

chenzhicheng synced commits to v4 at chenzhicheng/facefusion from mirror

40 minutes ago

chenzhicheng synced commits to main at chenzhicheng/ms-swift from mirror

56 minutes ago

chenzhicheng synced commits to update_imgs at chenzhicheng/xtuner from mirror

2 hours ago

chenzhicheng synced commits to offline-cache at chenzhicheng/xtuner from mirror

  • ceabe74215 [Fix] adapt clusterx brainpp breaking change

2 hours ago

chenzhicheng synced commits to main at chenzhicheng/xtuner from mirror

  • 0e40b0d7ba [Refactor] Refactor RL unittest part 1: add pr-fast unit test (#1865) * Add RL PR-fast test suite * Add RL PR-fast risk tests 新增 _prepare_train_data contract 测试,覆盖文本、VLM、invalid group 和 fail-fast 路径。 新增 SingleTurnAgentLoop batch judge/pause 测试,并收紧 batch judge 仅全组 COMPLETED 才触发。 新增 RolloutController fake routed 分支和 CPUResourceManager register/validate_or_raise PR-fast 覆盖。 * Remove duplicated RL tests migrated to PR-fast 删除 tests/rl 根目录下已经迁入 tests/rl/fast/pr_fast 的旧单测入口。 覆盖同名 PR-fast 测试,以及已合并到 test_rollout_logic.py 的 rollout worker/utils 旧测试。 * delete useless rollout_output.jsonl * fix ci * fix claude comments * fix ci

2 hours ago

chenzhicheng synced commits to gh-pages at chenzhicheng/xtuner from mirror

2 hours ago

chenzhicheng synced commits to agentic_branch at chenzhicheng/xtuner from mirror

  • 9a1190379c fix lint and enable routed API proxy (#1882) * fix lint and enable routed API proxy * fix doc

2 hours ago

chenzhicheng synced commits to v5.10-release at chenzhicheng/transformers from mirror

6 hours ago

chenzhicheng synced commits to update-docs at chenzhicheng/transformers from mirror

  • 95d70e2f45 address PR review
  • b1879ecb33 Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
  • 8970c1b6f6 Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
  • 2a2d08dcd5 Apply suggestion from @stevhliu Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
  • 8725d0d86d Apply suggestion from @stevhliu Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
  • Compare 16 commits »

6 hours ago

chenzhicheng synced commits to parakeet-rnnt-from-scratch at chenzhicheng/transformers from mirror

6 hours ago

chenzhicheng synced commits to nemotron-h-split-dense-sparse at chenzhicheng/transformers from mirror

  • 84a2be05c0 nemotron_h_sparse: support optional latent MoE projection Add `moe_latent_size`: when set, the routed experts run in a latent space (down-proj before, up-proj after; shared expert stays at hidden_size). Only Nemotron-3 Nano (A3B) leaves it unset — Super and Ultra both use it, so it lives on the shared sparse block. `None` is a no-op (Identity).

6 hours ago

chenzhicheng synced commits to main at chenzhicheng/transformers from mirror

  • effde20942 extend deepseek v4 test to xpu (#46366) * extend deepseek v4 test to xpu Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * update Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * update Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * make require_cuda_capability_at_least composable instead of incremental --------- Signed-off-by: Wang, Yi <yi.a.wang@intel.com> Co-authored-by: Ilyas Moutawwakil <ilyas@huggingface.co>
  • a921b4d886 Quantization for small models (#46449) * Quantization for small models Co-authored-by: Marc Sun <marc.sun@hotmail.fr> Co-authored-by: Phil Culliton <phillipculliton@gmail.com> Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * fix CI * let's ignore for now * ignore docs for now, later todo --------- Co-authored-by: Marc Sun <marc.sun@hotmail.fr> Co-authored-by: Phil Culliton <phillipculliton@gmail.com> Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> Co-authored-by: vasqu <antonprogamer@gmail.com>
  • f677e3a653 Added cosmos3 model (#46146) * Added cosmos3 model Signed-off-by: Maciej Bala <mbala@nvidia.com> * bugfix for processing_utils.py Signed-off-by: Maciej Bala <mbala@nvidia.com> * Add extra tests * Rename architecture; linter fixes Signed-off-by: Maciej Bala <mbala@nvidia.com> * Rename the architecture again for now Signed-off-by: Maciej Bala <mbala@nvidia.com> * Adapted to new Cosmos3 checkpoint format * Improve the checkpoint loading Signed-off-by: Maciej Bala <mbala@nvidia.com> * Simplified docs Signed-off-by: Maciej Bala <mbala@nvidia.com> * Renamed cosmos3 to cosmos3_reasoner Signed-off-by: Maciej Bala <mbala@nvidia.com> * Removed cosmos3 tests Signed-off-by: Maciej Bala <mbala@nvidia.com> * Revert processing fix Signed-off-by: Maciej Bala <mbala@nvidia.com> * Answered review comments Signed-off-by: Maciej Bala <mbala@nvidia.com> * Rename to Cosmos3ReasonerForConditionalGeneration Signed-off-by: Maciej Bala <mbala@nvidia.com> * Renamed config key Signed-off-by: Maciej Bala <mbala@nvidia.com> * Run make fix-repo Signed-off-by: Maciej Bala <mbala@nvidia.com> * Refactored naming Signed-off-by: Maciej Bala <mbala@nvidia.com> * Some more renaming; fix linter issues; Add Cosmos3 Processor Signed-off-by: MaciejBalaNV <mbala@nvidia.com> * Linter fixes Signed-off-by: MaciejBalaNV <mbala@nvidia.com> * Multiple fixes Signed-off-by: Maciej Bala <mbala@nvidia.com> * fix test and linter Signed-off-by: Maciej Bala <mbala@nvidia.com> * Make fix-repo fixes Signed-off-by: Maciej Bala <mbala@nvidia.com> * Linter fix Signed-off-by: Maciej Bala <mbala@nvidia.com> * Change date Signed-off-by: Maciej Bala <mbala@nvidia.com> * rename it all and load existing AutoModel * docs * fix repo * fix docs * ugly typo * adjust expectation * dont sample in test! * forgot this one * style --------- Signed-off-by: Maciej Bala <mbala@nvidia.com> Signed-off-by: MaciejBalaNV <mbala@nvidia.com> Co-authored-by: raushan <raushan@huggingface.co>
  • b1ac534932 ci: fix slack report job — use AWS runner and authenticated GitHub API calls (#46448) * Use group: aws-general-8-plus * Fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
  • dcf0c7c8c3 fbgemm_fp8:Keep the current device aligned with the input tensor (#46403) * fbgemm_fp8:Keep the current device aligned with the input tensor Signed-off-by: kaixuanliu <kaixuan.liu@intel.com> * update comment Signed-off-by: kaixuanliu <kaixuan.liu@intel.com> * move `on_device` func to common part Signed-off-by: kaixuanliu <kaixuan.liu@intel.com> * update Signed-off-by: kaixuanliu <kaixuan.liu@intel.com> * update code Signed-off-by: kaixuanliu <kaixuan.liu@intel.com> --------- Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>
  • Compare 27 commits »

6 hours ago

chenzhicheng synced commits to improve_dinos at chenzhicheng/transformers from mirror

  • 1e29c49fcf fall back to usual hidden states routing for now
  • b512098c9e fixup merge
  • 6a0a80d97b Merge branch 'main' into improve_dinos
  • d3f05911ab Fix convert_tokens_to_ids performance regression for slow tokenizers (#46315) (#46323) * Fix convert_tokens_to_ids performance regression for slow tokenizers (#46315) Slow (PreTrainedTokenizer) tokenizers resolved added-vocabulary tokens through the added_tokens_encoder property, which rebuilds and re-sorts the whole added-token mapping on every access and is read twice per token. That made convert_tokens_to_ids roughly O(T * N * log N) for N added tokens, a regression from the v5 tokenizer refactor (#40936). Read the maintained _added_tokens_encoder cache instead, restoring the v4 behavior that every other method in the file already relies on. Adds a network-free regression test using CTRLTokenizer. * Remove regression test per reviewer feedback
  • ad0323529a [docs] Romanian translation of `fast_tokenizers.md`, `custom_tokenizers.md`, `tokenizer_summary.md`, `image_processors.md` and `video_processors.md`. (#46356) * added fast_tokenizers.md * added custom_tokenizers.md * added tokenizer_summary.md * added image_processors.md * added video_processors.md * small fixes * added backbones.md * added feature_extractors.md * added processors.md; closed the preprocessors section
  • Compare 75 commits »

6 hours ago

chenzhicheng synced commits to hf-exporters at chenzhicheng/transformers from mirror

6 hours ago

chenzhicheng synced commits to gguf-matmul-kernels at chenzhicheng/transformers from mirror

  • e0aae61bf9 GGUF: uniform meta-time swap, no rename, bind kernels post-load Replace the rename-based swap plan with a uniform meta-time swap and post-load binding. The gguf_name -> hf_name mapping is no longer computed twice. - GgufLinear / GgufExperts take an optional quant_type: the swap creates uint8 placeholders sized from the Q4_K_M average (good enough for device_map memory estimation), and the exact quant type + metal kernels are bound post-load from the bytes that arrive (bind_after_load). Forward stays branch-free (compile). - GGUFQuantizedTensor.__torch_function__ now propagates quant_type through every subclass-preserving op (the loader's .to, the Q/K permute, the gate/up cat) so bind_after_load can read it back; these subclasses only exist during load. - replace_with_gguf_linear is a uniform swap (modules_to_not_convert + tied output embeddings skipped); the on-the-fly path passes its single config quant_type. load_checkpoint_state drops per-tensor work: a coarse metadata-only "all kernel-supported?" check gates the byte path vs full dequant. - postprocess_model reconciles: bind each module's kernels (from the bytes, or module_quant_types on reload), revert float-source Linears (MoE routers) to nn.Linear, and record module_quant_types for the save round-trip. It also drops the GGUF load-renames so save_pretrained writes HF names (the swapped model is now standard safetensors, re-swapped on reload) — fixes the round-trip. Validated on MPS: dense (qwen2) and MoE (qwen2moe: router revert + experts) swap generation, gemma2/nemotron/gemma3 de-offset conversion, on-the-fly swap, dequant-on-load, and save_pretrained -> from_pretrained round-trip; 64 fast tests.
  • e09ea8793b GGUF: expose header metadata without materializing tensors load_gguf_checkpoint now computes `tensor_quant_types` ({name: quant_type}) and `weight_mapping` unconditionally — they are read straight off the GGUF header (no tensor data), so a `return_tensors=False` call returns them cheaply. Only the eager `np.copy` of tensor bytes stays behind `return_tensors=True`. This lets the module-swap plan be built from metadata + renamings alone (pure name resolution, no tensor load / no conversion). Verified: return_tensors=False yields 291 quant types + 12 rules with no `tensors`; full load and AutoConfig via gguf unchanged; 63 fast tests pass.
  • b11c4d7a96 GGUF: de-offset norms via keep-in-fp32 instead of pre-applying SubtractOne is a real fp32 op again. Rather than pre-subtracting on the fp32 source in load_checkpoint_state, the quantizer forces the `w + 1` norms to stay fp32 for the load via `_keep_in_fp32_modules_strict` (covers fp16 and bf16), keyed off the SubtractOne converter targets in weight_mapping — no arch list. The converter then subtracts in fp32 and the result is exact. Those norms end up fp32 on the GGUF load, so the weights-conversion tests compare at the original dtype. Validated on MPS: nemotron(fp16)/gemma2(fp16)/ gemma3_text(bf16) conversion + stablelm control + qwen2 generation.
  • Compare 3 commits »

6 hours ago

chenzhicheng synced commits to finegrained-fp8-v2 at chenzhicheng/transformers from mirror

  • beb0605168 Merge branch 'main' into finegrained-fp8-v2
  • effde20942 extend deepseek v4 test to xpu (#46366) * extend deepseek v4 test to xpu Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * update Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * update Signed-off-by: Wang, Yi <yi.a.wang@intel.com> * make require_cuda_capability_at_least composable instead of incremental --------- Signed-off-by: Wang, Yi <yi.a.wang@intel.com> Co-authored-by: Ilyas Moutawwakil <ilyas@huggingface.co>
  • a921b4d886 Quantization for small models (#46449) * Quantization for small models Co-authored-by: Marc Sun <marc.sun@hotmail.fr> Co-authored-by: Phil Culliton <phillipculliton@gmail.com> Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> * fix CI * let's ignore for now * ignore docs for now, later todo --------- Co-authored-by: Marc Sun <marc.sun@hotmail.fr> Co-authored-by: Phil Culliton <phillipculliton@gmail.com> Co-authored-by: Ryan Mullins <ryan@ryanmullins.org> Co-authored-by: vasqu <antonprogamer@gmail.com>
  • f677e3a653 Added cosmos3 model (#46146) * Added cosmos3 model Signed-off-by: Maciej Bala <mbala@nvidia.com> * bugfix for processing_utils.py Signed-off-by: Maciej Bala <mbala@nvidia.com> * Add extra tests * Rename architecture; linter fixes Signed-off-by: Maciej Bala <mbala@nvidia.com> * Rename the architecture again for now Signed-off-by: Maciej Bala <mbala@nvidia.com> * Adapted to new Cosmos3 checkpoint format * Improve the checkpoint loading Signed-off-by: Maciej Bala <mbala@nvidia.com> * Simplified docs Signed-off-by: Maciej Bala <mbala@nvidia.com> * Renamed cosmos3 to cosmos3_reasoner Signed-off-by: Maciej Bala <mbala@nvidia.com> * Removed cosmos3 tests Signed-off-by: Maciej Bala <mbala@nvidia.com> * Revert processing fix Signed-off-by: Maciej Bala <mbala@nvidia.com> * Answered review comments Signed-off-by: Maciej Bala <mbala@nvidia.com> * Rename to Cosmos3ReasonerForConditionalGeneration Signed-off-by: Maciej Bala <mbala@nvidia.com> * Renamed config key Signed-off-by: Maciej Bala <mbala@nvidia.com> * Run make fix-repo Signed-off-by: Maciej Bala <mbala@nvidia.com> * Refactored naming Signed-off-by: Maciej Bala <mbala@nvidia.com> * Some more renaming; fix linter issues; Add Cosmos3 Processor Signed-off-by: MaciejBalaNV <mbala@nvidia.com> * Linter fixes Signed-off-by: MaciejBalaNV <mbala@nvidia.com> * Multiple fixes Signed-off-by: Maciej Bala <mbala@nvidia.com> * fix test and linter Signed-off-by: Maciej Bala <mbala@nvidia.com> * Make fix-repo fixes Signed-off-by: Maciej Bala <mbala@nvidia.com> * Linter fix Signed-off-by: Maciej Bala <mbala@nvidia.com> * Change date Signed-off-by: Maciej Bala <mbala@nvidia.com> * rename it all and load existing AutoModel * docs * fix repo * fix docs * ugly typo * adjust expectation * dont sample in test! * forgot this one * style --------- Signed-off-by: Maciej Bala <mbala@nvidia.com> Signed-off-by: MaciejBalaNV <mbala@nvidia.com> Co-authored-by: raushan <raushan@huggingface.co>
  • b1ac534932 ci: fix slack report job — use AWS runner and authenticated GitHub API calls (#46448) * Use group: aws-general-8-plus * Fix --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
  • Compare 45 commits »

6 hours ago