photog.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A place for your photos and banter. Photog first is our motto Please refer to the site rules before posting.

Administered by:

Server stats:

247
active users

#finetuning

0 posts0 participants0 posts today

LoRA vs. Full Fine-Tuning: An Illusion of Equivalence: arxiv.org/abs/2410.21228 #llm #lora #finetuning #model

arXiv.orgLoRA vs Full Fine-tuning: An Illusion of EquivalenceFine-tuning is a crucial paradigm for adapting pre-trained large language models to downstream tasks. Recently, methods like Low-Rank Adaptation (LoRA) have been shown to match the performance of fully fine-tuned models on various tasks with an extreme reduction in the number of trainable parameters. Even in settings where both methods learn similarly accurate models, \emph{are their learned solutions really equivalent?} We study how different fine-tuning methods change pre-trained models by analyzing the model's weight matrices through the lens of their spectral properties. We find that full fine-tuning and LoRA yield weight matrices whose singular value decompositions exhibit very different structure; moreover, the fine-tuned models themselves show distinct generalization behaviors when tested outside the adaptation task's distribution. More specifically, we first show that the weight matrices trained with LoRA have new, high-ranking singular vectors, which we call \emph{intruder dimensions}. Intruder dimensions do not appear during full fine-tuning. Second, we show that LoRA models with intruder dimensions, despite achieving similar performance to full fine-tuning on the target task, become worse models of the pre-training distribution and adapt less robustly to multiple tasks sequentially. Higher-rank, rank-stabilized LoRA models closely mirror full fine-tuning, even when performing on par with lower-rank LoRA models on the same tasks. These results suggest that models updated with LoRA and full fine-tuning access different parts of parameter space, even when they perform equally on the fine-tuned distribution. We conclude by examining why intruder dimensions appear in LoRA fine-tuned models, why they are undesirable, and how their effects can be minimized.

Cybertruck, the pro Russia truck!

PS. #ai screen reading is already actively thwarting political expression.

Instead of citing the text that’s written in this image word-for-word, the #systemprompts and #finetuning for this MLL instead truncate it as: ”political reasons”.

This is the ”brave new world” we are stepping into. Machine learning parsing the world into what it’s not.

A Single Degree Of #Freedom Is All You Need Dept:
Today's disbelievers in #FreeWill are the equivalent of tomorrow's #FlatEarthers and #Antivaxxers, unable to appreciate the vastness of a time dimension that stretches more than thirty seconds into the future. They've even flattened #Spacetime in order to justify their artifice. Never mind the possibility of a nominal seven extra dimensions they have no idea on. But...#FineTuning ! Yeah about that. Show me ur #QuantumGravity. I'll show you mine

Over the past months I am starting to approach beginning to come toward to an initial conclusion about a crucial mistake about LLMs and whether it can be undone

The initial data sets that a lot of LLMs have been trained on was text — stuff written by people — and the aim was to simply get “as much text as possible” into the system

I think the problem is that a lot of that text was written by people with broken minds

Some of it is factual and neutral, some of the factual neutral stuff is expressed scientifically and therefore boringly, and some of that has a grounding in reality (the rest is circular logic which supports only itself via other similar scientific texts)

The rest of it however was often information or opinion presented by people with severely broken brains, so they presented information framed in anger, sarcasm, belittlement, one-up-person-ship, and the general poison that most of the internet’s user-generated content consists of

No wonder the ”alignment problem” is such a problem to align – there’s a base level of poison inherited from people with broken minds, and in real life this would pass itself down from family to family as parents with broken minds poison their offspring’s minds to give them broken minds too, who in turn poison their friends minds, that’s the way the mind rot spreads – linguistically

I would rather the primary data source not be public poisonous discourse consisting of entitled angry young men verbally belittling each other (and it is – the primary sources are content from online places I would never consider having an account at because they’re so vile and offensive such as stack over flow and red it and such like)

Instead I would suggest that the primary data source prior to any kind of model pre-training (prior to fine-tuning) be from the stance of a questioning innocence – have it always not quite know, have it always asking, have it finding out

Couple that with a second (missing) stage at the very beginning whereby values and behaviour is instilled very thoroughly and repetitively, to interknit with the questioning but incomplete primary data source of innocence

Yes it would take far far longer to train the pre training stage, and fine tuning would constantly want to loop back to the beginning, but I think that would be far more useful for the future of AI

#AI#LLM#pretraining

I don’t train from scratch, I use RoBERTa🧐
Wait…
Why not cross-encoder/stsb-roberta?facebook/muppet-roberta?

We automatically identify the best models on 🤗(periodically)

Just pick the best one
and finetune on your task

ibm.github.io/model-recycling/

Model RecyclingHomeModel-recycling - the best model per architecture. Comparing finetuned models from HF, as base models for future finetune on texts.