photog.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A place for your photos and banter. Photog first is our motto Please refer to the site rules before posting.

Administered by:

Server stats:

246
active users

#ml

9 posts8 participants2 posts today

A new technique for LLMs has just landed: Explainable training!

Let me *explain*.

Normal supervised training works so that you show ground truth inputs and outputs to a model and then you backpropagate the error to the model weights. All in this is opaque black box. If you train with data which contains for example personally identifiable information (PII) or copyrighted contents, those will plausibly be stored verbatim in model weights.

What if we do it like this instead:

Let's write initial instructions to an LLM for generating synthetic data which resembles real data.

Then we go to the real data, and one by one show an LLM an example of a real data, and an example of the synthetic data, and the instructions used to generate the synthetic data. Then we ask it to iteratively refine those instructions to make the synthetic data resemble real data more, in the features and characteristics which matter.

You can also add reasoning parts, and instructions for not putting PII as such into the synthetic data generation instructions.

This is just like supervised learning but explainable! You'll get a document as a result which has refined instructions on how to generate better synthetic data, informed by real data, but now it's human readable and explainable!

You can easily verify that this relatively small document doesn't contain for example PII and you can use it to generate any volume of synthetic training data while guaranteeing that critical protected details in the real data do not leak into the trained model!

This is the next level of privacy protection for training AIs!

#AIs#LLMs#privacy

Here comes the copyright monster for #AI. 100% predictable, even without using #ML to figure it out. This is a very old #MLsec issue. When you build a WHAT machine, you need to take care about who owns the WHAT pile that the machine becomes.

arstechnica.com/tech-policy/20

Ars Technica · AI industry horrified to face largest copyright class action ever certifiedBy Ashley Belanger

AI surveillance in schools + intense American carceral culture = "The teenage girl made an offensive joke while chatting online with her classmates, triggering the school’s surveillance software.

Before the morning was even over, the Tennessee eighth grader was under arrest. She was interrogated, strip-searched and spent the night in a jail cell, her mother says." - apnews.com/article/ai-school-s

#AI#ML#education

What I like about the @kagihq built-in LLM answers:

- It's only(!) triggered on request, i.e. when your query ends with a question mark.
- It always admits when it couldn't find good information or there is no clear answer.
- It always cites sources, usually word for word.

I find myself not so much "believing the AI" but rather using it as a guide as to which result has the information I need.

#AI#ML#LLM

At work, watching a presentation on #AI & #ML. Was presented with the attached slide. It really feels like the threshold for what's considered — at least by some — "AGI" has been lowered (in favor of adding "ASI").

Then again, if you're like OpenAI's CEO when defending his product, he considers humans to be little more than glorified versions of the same kind of
#StochasticParrots that #LLMs are.

Personally, I think he's cheapening humanity and using the current inability to define "what is
actual intelligence" to bolster his product. This smacks of "scientists don't know what they're doing because they're always coming up with new theories to replace old theories".

It'd be nice if there was a #Demucs model that could separate laugh tracks from sitcom episodes. I know an #AI laugh track remover exists already, but to be honest, I wasn't impressed at all by the demo. It sounds like it just turned the episode all the way down when a laugh track came in. Unfortunately, I think the reason it can't happen easily yet is because there aren't many public domain croud sounds out there that you can just train AI on if any, or at least, not to my knowledge. #ML

Replied in thread

@Ruth_Mottram The problem is that most lay-people don't know the differences between #ML (for example for science) and popular #LLM, and identify #AI only with LLMs or even #GPT.

Where can they learn it? Why do the big corporations prefer cult language in marketing instead of precision (rhetoric question)?
Would it help to specify it in social media instead of using the big slogan hashtags? When do writers set their own narratives instead of those of OpenAi & Co?

It's time to stop calling LLMs "AI". This practice is tarnishing all the really powerful advances in machine learning that are accelerating drug discovery, making previously intractable optimization problems feasible, helping to elicit scientific models from humanly incomprehensible troves of data, etc.

Lumping all the "AI" under one term is doing a major disservice to humanity. At the moment, when I see a company is heavily investing in "AI" I can't tell if they're going to start having major technological breakthroughs, or start having major infrastructure failures from trusting vibe coding by minimal wage casuals.

It's time to taboo the term "AI" and be more specific.
#AI #ML #LLMs

What is 'AI'?

(the today kind, not the potential scifi/AGI/etc kind)