photog.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A place for your photos and banter. Photog first is our motto Please refer to the site rules before posting.

Administered by:

Server stats:

243
active users

#benchmarks

1 post1 participant0 posts today

Durch #Benchmarks sollen KI-Modelle vergleichbar sein. Firmen zeigen mit Tests & Ergebnissen Fähigkeiten der Modelle, die Aussagekraft ist aber oft unklar. Forschende: Etablierte Benchmarks machen #KI vergleichbar, sind aber nur Indiz für reale Leistung: sciencemediacenter.de/angebote

alojapan.com/1331021/japanese- Japanese-led XRISM makes first-ever direct detection of sulfur in two states #benchmarks #GraphicsCard #Japan #JapanNews #Japanese #JapaneseNews #laptop #nasa #netbook #news #notebook #processor #reports #review #reviews #test #tests #XRISMSatellite #XRISMSatelliteDetectsSulfurInTwoStates An international team of scientists has, for the first time, directly detected sulfur in both its gas and solid phases in the interstellar medium — the gas-

Okay, here it is. This is the unofficial official timeline of #AI. I'm going to tell you what to expect, and it's definitely not: this all goes away and we return to before.

Are you ready for this? Are you sure? Well, read on.

Before I continue, I'm going to lay out some AI #benchmarks that we'll use to define "how good / scary is this AI?" This is in rough order of difficulty.

#Lovelace #Test for #Emergence: "Can a system produce surprising and useful outputes that weren't explicitely programmed via weak emergence?"

#Loebner Test: "Can a computer fool casual human judges in text conversations?" ( #Modern #LLM AIs are close to this )

#Turing Test (Original Imitation Game): "A man or a computer and a woman are both answering text interrogations trying to convince them that they are the woman. Can the computer perform as well as the man?" (This was the actual orginial #TuringTest.)

Strengthened #Imitation Game: "A man or a #computer and a woman are both answering text interrogations. Can the computer perform as well as the woman?"

#Coffee Test: "Can a #system enter a strangers house with no prior infor and using #perception, imitation, and #reasoning figure out how to make a cup of coffee?"

#College #Student Test: "Can a robot enroll in college, attend classes like an actual student, learn from the instructions things it didn't know before, and graduate?"

#VoightKampff Test: "Can a machine withstand adversarial exper interrogation and still pass as #human?"

#Harnad's Total Turing Test: "Is the system indistinguishible from humans in every aspect?" (This is a #DuckTest.)

Non #Duck Test: "Even with full access to internals, can experts find no evidence that it isn't a genuine human mind?"

You know how sometimes a little hobby side-project can get a bit out of hand? An unexpected performance regression on speed.python.org that only showed up on GCC 5 (and 7) led me to set up more rigorous tracking of Python performance when using different compilers. I'm still backfilling data but I think it's pretty awesome to see how much, and how consistently, free-threaded Python performance has improved since 3.13:

github.com/Yhg1s/python-benchm

GitHubGitHub - Yhg1s/python-benchmarking-public: Curated results from personal bench_runner benchmarksCurated results from personal bench_runner benchmarks - Yhg1s/python-benchmarking-public

🔔 New Essay 🔔

"The Intelligent AI Coin: A Thought Experiment"

Open Access here: seanfobbe.com/posts/2025-02-21

Recent years have seen a concerning trend towards normalizing decisionmaking by Large Language Models (LLM), including in the adoption of legislation, the writing of judicial opinions and the routine administration of the rule of law. AI agents acting on behalf of human principals are supposed to lead us into a new age of productivity and convenience. The eloquence of AI-generated text and the narrative of super-human intelligence invite us to trust these systems more than we have trusted any human or algorithm ever before.

It is difficult to know whether a machine is actually intelligent because of problems with construct validity, plagiarism, reproducibility and transferability in AI benchmarks. Most people will either have to personally evaluate the usefulness of AI tools against the benchmark of their own lived experience or be forced to trust an expert.

To explain this conundrum I propose the Intelligent AI Coin Thought Experiment and discuss four objections: the restriction of agents to low-value decisions, making AI decisionmakers open source, adding a human-in-the-loop and the general limits of trust in human agents.

@histodons @politicalscience

seanfobbe.com · [Essay] The Intelligent AI Coin: A Thought Experiment
More from Seán Fobbe

The #IntelArc B580 12GB is the card to beat, period.

I see Intel selling these cards like hotcakes. I don't think the "Buy NVIDIA" mindset exists at its price of $249 when there is no RTX at that price point.

youtube.com/watch?v=dboPZUcTAW4
youtube.com/watch?v=aV_xL88vcAQ
youtube.com/watch?v=yKMigkGU8vI
youtube.com/watch?v=JjdCkSsLYLk

#AMD#NVIDIA#Arc

I'm comparing the #Framework motherboards with the AMD Ryzen 7840U against the Intel Ultra 7 155H, which is $120 more expensive.

The two processors appear to perform similarly on these benchmarks, and single-core performance is just the same of my current #System76 Lemur Pro with an ultra low-power Intel CPU (i7-1355U).

I couldn't find results to compare the integrated GPUs.

cpubenchmark.net/compare/5322v
#benchmarks

www.cpubenchmark.netAMD Ryzen 5 5560U vs Intel N100 vs i5-2410M vs i5-2450P vs i5-2550K [cpubenchmark.net] by PassMark Software