• afk_strats@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    4 months ago

    Working pruning techniques are tested and seem at least good at maintaining coherent transformer MOE models. https://doi.org/10.48550/arXiv.2510.13999

    There are several working examples of REAP pruned models HuggingFace and that method seems very good.

    The op paper suggests a technique which starts with an arbitrary structured expers pruned during training. I’m not 100% understanding it, but I still don’t think I’ve seen this exact technique which might be even more efficient