photog.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A place for your photos and banter. Photog first is our motto Please refer to the site rules before posting.

Administered by:

Server stats:

246
active users

#nlp

0 posts0 participants0 posts today

🤔 What is #NLP research 𝘳𝘦𝘢𝘭𝘭𝘺 about?
We analyzed 29k+ papers to find out! 📚🔍

📌 Our NLPContributions dataset, from the ACL Anthology, reveals what authors actually contribute—artifacts, insights, and more.

📈 Trends show a swing back towards language & society. Curious where you fit in?

🎁 Tools, data, and analysis await you:

📄 Paper: arxiv.org/abs/2409.19505
🌐Project: ukplab.github.io/acl25-nlp-con
💻 Code: github.com/UKPLab/acl25-nlp-co
💾 Data: tudatalib.ulb.tu-darmstadt.de/

(1/🧵)

VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification.

This paper presents VLAI, a transformer-based model that predicts software vulnerability severity levels directly from text descriptions. Built on RoBERTa, VLAI is fine-tuned on over 600,000 real-world vulnerabilities and achieves over 82% accuracy in predicting severity categories, enabling faster and more consistent triage ahead of manual CVSS scoring. The model and dataset are open-source and integrated into the Vulnerability-Lookup service.

We ( @cedric and I) decided to make a paper to better document how VLAI is implemented. We hope it will give other ideas and improvements in such model.

#vulnerability #cybersecurity #vulnerabilitymanagement #ai #nlp #opensource

@circl

🔗 arxiv.org/abs/2507.03607

arXiv.orgVLAI: A RoBERTa-Based Model for Automated Vulnerability Severity ClassificationThis paper presents VLAI, a transformer-based model that predicts software vulnerability severity levels directly from text descriptions. Built on RoBERTa, VLAI is fine-tuned on over 600,000 real-world vulnerabilities and achieves over 82% accuracy in predicting severity categories, enabling faster and more consistent triage ahead of manual CVSS scoring. The model and dataset are open-source and integrated into the Vulnerability-Lookup service.

How big of a deal would it be if someone developed a language model (kind of like ChatGPT) which didn't hallucinate, didn't use prodigious amounts of electricity/water/compute/memory, which ran locally or on a distributed user mesh instead of corporate server farms, and which remembered and learned from what you say if you want it to? Something which was reliable and testable and even interpretable -- meaning you could pop the hood and see what it's really doing. Would you be inclined to use a system like this? Are there other things you'd still take issue with?

#LLM
#ChatGPT
#NLP
#NLU

Continued thread

The only actual machine learning the system uses, aside from its purely emergent ability to learn language from context, is in the parser, where I adjust the probability of matching certain patterns based on prior success in understanding the user.

#NLP
#NLU

Continued thread

I wrote this code in my spare time, over about a decade. I wrote the parser, the knowledge graph, the conversational engine, everything from scratch in pure python. It starts out literally knowing nothing except English grammar, and learns everything from interactions with the user. These tests I posted... It is literally learning the words from context as I use them. It had no idea what a dog or an apple or the color red is until I tell it things about them.

#NLP
#NLU

Continued thread

Here's another passing unit test. Again, this is stuff I wrote almost a decade ago, before LLMs were even invented.

> The brown dog bites me.
> The black dog barks at me.
> The dog that bites me hates me.
> Which dog hates me?
< The brown dog.
> Does the brown dog bite me?
< Yes.
> Does the brown dog hate me?
< Yes.
> Does the black dog hate me?
< I don't know.
> Does the dog that barks at me hate me?
< I don't know.
> Is the brown dog the dog that bites me?
< Yes.

#NLP
#NLU

Just dug out some of my code from almost a decade ago, pre-LLM, and got it working again. Some examples of what it could do, taken from actual passing unit tests, are below. Lines starting with > are what the user says, and lines starting with < are what the system says.

> The apple is red.
> The worm is in the apple.
> The apple is in the basket.
> Where is the worm?
< In the apple.
> Where is the apple?
< In the basket.
> Is the worm in the apple?
< Yes.
> Is the apple in the basket?
< Yes.
> Is the worm in the basket?
< I don't know.
> What is in the apple?
< The worm.
> Is the apple red?
< Yes.

#NLP
#NLU

🔔 NEU 🔔

Alle 4566 Plenarprotokolle des Deutschen Bundestages von 1949 bis 2025 (Stichtag: 24. Mai) ab sofort im 'Corpus der Plenarprotokolle des Deutschen Bundestages' (CPP-BT) verfügbar.

Auch Einzelreden mit Name, ID und Fraktion der Redner:in!

🔶 Download 🔶

💾 Datensatz - doi.org/10.5281/zenodo.4542661

📒 Codebook - zenodo.org/records/15462956/fi

💻 #RStats Source Code - doi.org/10.5281/zenodo.4542665

🔶 Features 🔶

+ Insgesamt bis zu 35 Variablen in der CSV-Variante
+ Plenarprotokolle von der 1. Wahlperiode bis zur neuesten 21. Wahlperiode am Stichtag
+ Aufteilung in Einzelreden u.a. mit ID, Name, Fraktion und Amt der Redner:in (ab 18. Wahlperiode)
+ Aufteilung in Protokollbestandteile: Inhaltsverzeichnis, Sitzungsverlauf, Anlagen, Rednerliste (ab 18. Wahlperiode)
+ Fortlaufende Aktualisierung (Datensatz kann zusätzlich via Pipeline täglich aktualisiert werden)
+ Urheberrechtsfreiheit
+ Offene und plattformunabhängige Formate (PDF, TXT, CSV, XML, Parquet)
+ Linguistische Kennzahlen
+ Umfangreiches Codebook
+ Compilation Report, um den Erstellungs-Prozess zu erläutern
+ Dutzende Diagramme und Tabellen für alle Zwecke
+ Diagramme in einem für den Druck (PDF) und das Web (PNG) optimierten Format
+ Kryptographische Signaturen
+ Veröffentlichung des Source Codes (Open Source)

@rstats @politicalscience @histodons #OpenAccess #OpenSource #OpenScience #Parliament #Bundestag #Plenarprotokoll #Histodons #HistodonsDE #NLP #Dataviz #Legislative #Debate