photog.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
A place for your photos and banter. Photog first is our motto Please refer to the site rules before posting.

Administered by:

Server stats:

238
active users

#nlproc

0 posts0 participants0 posts today

🚀 𝗦𝗢𝗧𝗔 𝟬-𝘀𝗵𝗼𝘁 𝗺𝗼𝗱𝗲𝗹𝘀 𝗴𝗿𝗼𝘄𝗶𝗻𝗴? 𝘿𝙤𝙣'𝙩 𝙗𝙧𝙚𝙖𝙠 𝙩𝙝𝙚 𝙘𝙤𝙢𝙥𝙪𝙩𝙚-𝙗𝙖𝙣𝙠! ⚡💡

Discover Statement-Tuning in our #NAACL2025 paper: we transform NLU tasks into natural language statements, letting small models like RoBERTa shine ✨ in zero & few-shot settings at a fraction of the cost. 🔥
(1/🧵)

Someone from U Zurich did an undisclosed persuasion experiment on Reddit users in r/ChangeMyView using #LLM bots. This kind of social media research is absolutely unethical and the "results" should not be published.
Additional shame on the ethics committee for arguing *for* publication. In my view, this is outrageous scientific misconduct. #nlproc #academia #ethics #socialMedia
reddit.com/r/changemyview/comm

Continued thread

🌍 We welcome applicants from all backgrounds and nationalities.

📅 Application deadline: May 25th, 2025.
After that, the position will remain open until filled. We will consider applications as soon as they are submitted.

(4/4)

#NLProc#NLP#Postdoc

#PhD job in the Dept. of Language and Information Sciences at the University of Lausanne: my colleague Davide Picca has an open PhD position starting on October 1, 2025 in an SNSF-funded project focused on the computational analysis of Charles S. #Peirce’s manuscripts.

Deadline for application: May 19, 2025

career5.successfactors.eu/care

career5.successfactors.euCareer Opportunities: Doctoral Student SNSF in Digital Humanities and Computational Semiotic Studies (22226)

We’ve redesigned mobile email replies - with & without AI. Tap sentences while reading to enter local responses (or get suggestions). Then connect them on a usual draft screen (or let AI do just that). Result: Flexible workflows with varying speed and control. #CHI2025 preprint in comments.

#HCI#AI#NLP
Continued thread

7/

REFERENCES

[1] Yifu Qiu, Varun Embar, Yizhe Zhang, Navdeep Jaitly, Shay B Cohen, and Benjamin Han. 2025. Eliciting in-context Retrieval and reasoning for long-context large language models. arxiv.org/abs/2501.08248

arXiv.orgEliciting In-context Retrieval and Reasoning for Long-context Large Language ModelsRecent advancements in long-context language models (LCLMs) promise to transform Retrieval-Augmented Generation (RAG) by simplifying pipelines. With their expanded context windows, LCLMs can process entire knowledge bases and perform retrieval and reasoning directly -- a capability we define as In-Context Retrieval and Reasoning (ICR^2). However, existing benchmarks like LOFT often overestimate LCLM performance by providing overly simplified contexts. To address this, we introduce ICR^2, a benchmark that evaluates LCLMs in more realistic scenarios by including confounding passages retrieved with strong retrievers. We then propose three methods to enhance LCLM performance: (1) retrieve-then-generate fine-tuning, (2) retrieval-attention-probing, which uses attention heads to filter and de-noise long contexts during decoding, and (3) joint retrieval head training alongside the generation head. Our evaluation of five well-known LCLMs on LOFT and ICR^2 demonstrates significant gains with our best approach applied to Mistral-7B: +17 and +15 points by Exact Match on LOFT, and +13 and +2 points on ICR^2, compared to vanilla RAG and supervised fine-tuning, respectively. It even outperforms GPT-4-Turbo on most tasks despite being a much smaller model.
#NLP#NLProc#RAG
Continued thread

6/

Through extensive experiments on five LCLMs using both the LOFT and ICR² benchmarks, our best approach on Mistral-7B with a 32K token limit outperformed Vanilla RAG and SFT baselines by an average of +17 and +15 points (Exact Match) on LOFT, and by +13 and +2 points on ICR², respectively (picture). It even achieved performance comparable to the state-of-the-art GPT-4, despite having only 7B parameters.

#NLP#NLProc#RAG
Continued thread

4/

With a more realistic benchmark in hand, we systematically explored three approaches to enhance model performance:

1. Retrieve-then-generate supervised fine-tuning (picture): we train LCLMs to first retrieve relevant information from the context and then generate the final responses.

2. Retrieval-attention-probing: During inference, we probe attention heads activated for in-context retrieval, and use their top predictions to filter out confounders.

#NLP#NLProc#RAG
Continued thread

3/

This limitation often leads to inflated results. To address this, we created a more realistic dataset ICR². It uses five retrievers to generate challenging negative documents (picture 1). Our results show significant performance drop with standard RAG setups. For example, with GPT-4-Turbo, accuracy on NQ dropped from 0.85 to 0.67, and on HPQA, it fell from 0.78 to 0.64 (picture 2).

#NLP#NLProc#RAG
Continued thread

2/

But are current LCLMs up to the task? If not, how can we improve their performance?

In our preprint [1], we evaluated five popular LCLMs using the LOFT benchmark [2], which involves answering questions paired with documents. However, LOFT relies on random sampling to create irrelevant (negative) documents for each query, failing to include confounding documents — those that are relevant but misleading — which are common in real-world scenarios.

#NLP#NLProc#RAG

1/

What if #LLMs had context windows so large that an entire knowledge base could fit into a single prompt? This would revolutionize Retrieval-Augmented Generation (RAG) applications by enabling retrieval, re-ranking, reasoning, and generation all in one step. With a Long-Context Language Model (LCLM), we could simplify RAG architecture by leveraging the model’s capability for In-Context Retrieval and Reasoning (ICR²).

#NLP#NLProc#RAG