• Home
    • About
    • Ongoing Projects
    • Collaboration
  • People
  • Publications
    • Conferences
    • Journals
    • Workshops
  • Resources
  • Affairs
    • Events
    • Friday Talks
  • Achivements
  • Gallery
  • Contact
Latest News


Upcoming Talks

Nupur Dutta

JRA, IIT Patna
Date: 21-11-2025
Title: मातृभाषा ओडिया: अस्मिता, संकट और पुनर्जागरण का आह्वान

Abstract – मातृभाषा ओडिया: अस्मिता, संकट और पुनर्जागरण का आह्वान
यह लेख ओडिया भाषा की समृद्ध विरासत, नई पीढ़ी में घटते सम्मान और वैश्वीकरण के प्रभाव से उत्पन्न संकट पर प्रकाश डालता है। इसमें मातृभाषा के संरक्षण, उपयोग और पुनर्जीवन के लिए समाज, सरकार और परिवारों की संयुक्त जिम्मेदारी का आह्वान किया गया है।

Previous Talks

Dr. Baban Gain

PhD, IIT Patna
Date: 14-11-2025
Title: The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units

Dr. Saroj Kumar Jha

SRA Language
Date: 07-11-2025
Title: BhashaVerse: Translation Ecosystem for Indian Subcontinent Languages

Dibyanayan Bandyopadhyay

PhD Student, IIT Patna
Date: 31-10-2025
Title: A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners
Venue: EMNLP 2024

Ayodhya Murmu

JRA Language
Date: 24-10-2025
Title: Santali Language and Script Related Issues




Anansa Roy

JRA Language
Date: 17-10-2025
Title: The Role of Pragmatics in Shaping Language and Meaning



Sandeep Kumar

Ph.D. Research Scholar, IIT Patna
Date: 10-10-2025
Title: FactLens: Benchmarking Fine-Grained Fact Verification
Venue: ACL 2025



Somenath Nag Choudhury

Assistant Professor, CSE, RCCIIT
Date: 26-09-2025
Title: MMED-RAG: VERSATILE MULTIMODAL RAG SYS-TEM FOR MEDICAL VISION LANGUAGE MODELS

Dr. Vineet Kumar Lal Das

JRA Language
Date: 12-09-2025
Topic:विद्यापति मशीन द्वारा मैथिली से हिंदी में अनूदित 100 वाक्यों का विश्लेषण



Souravi Halder

JRA Language
Date: 29-08-2025
Title: Safe Use of Social Media in the Era of ChatGPT



Boynao Kshetrimayum

Senior Research Associate, IIT Patna
Date: 22-08-2025
Topic: ReFT: Reasoning with Reinforced Fine-Tuning (ACL 2024 Paper)



Prajwal Vijay Kajare

PhD Student, IIT Jodhpur
Date: 08-08-2025
Topic: Beyond the trade-off: Self-Supervised Reinforcement Learning for Reasoning Models’ Instruction Following

Dr. Saloka Sengupta

JRA Language

Date: 25-07-2025
Topic: "Toward cultural interpretability:
A linguistic anthropological framework for describing and evaluating large language models" by Graham M Jones, Shai Satran and Arvind Satyanarayan


Rochit Ranjan

IIT Jodhpur

Date: 18-07-2025

Topic: Can LLMs Obfuscate Code? A Systematic Analysis of Large Language Models into Assembly Code Obfuscation



Dr. Saroj Kumar Jha

SRA Language

Date: 11-07-2025
Topic: Linguistic Resources for Bhojpuri, Magahi, and Maithili: Statistics about Them, Their Similarity Estimates, and Baselines for Three Applications.




Soham Bhattacharjee

JRA Technical
Date: 04-07-2025
Topic: DRT: Deep Reasoning Translation via Long Chain-of-Thought


Dr. Pramod Singh

JRA Language
Date: 27-06-2025
Topic: अनुवाद (translation)



Baban Gain

PhD Student, IIT Patna
Date: 13-06-2025
Topic: Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation

Nitika Gupta

Junior Research Associate – Language
Date: 06-06-2025
Topic: Importance of Case Markers in Bengali Language


Dibyanayan Bandyopadhyay

PhD student
Date: 30-05-2025
Topic: Zero Shot Natural Language Explanation


Bigyan Ranjan Das

JRA Language (Hindi-Odia)
Date: 23-05-2025
Topic: Nuances of Hindi-Odia Translation_Exploring Linguistic, Cultural, and Grammatical Subtleties

Somenath Nag Choudhury

PhD Student, IIT Patna
Date: 02-05-2025
Title: MITER: Medical Image–Text joint adaptive pretraining with multi-level contrastive learning

Ayodhya Murmu
Ayodhya Murmu

JRA (Language: Santali)
Date: 09-05-2025
Title: Facing Issues and Solution in Hindi to Santali Validation


Priyanshu Priya

PhD Student, IIT Patna
Date: 25-04-2025
Topic: AAAI-25 Conference Chronicles: Unveiling the Highlights and Takeaways

Anansa Roy
Anansa Roy

JRA (Language)
Date: 18-04-2025
Title: Postposition and its Usage in Bangla

Dr. Vineet Kumar Lal Das

JRA Language
Date: 28-03-2025
Topic:हिन्दी-मैथिली अनुवाद (मानवीय) करते समय होने वाली त्रुटियां


Abid Hossain

Phd Students
Date: 21-03-2025

Topic: Not All Languages Are Created Equal in LLMs: Improving Multilingual Capability by Cross-Lingual-Thought Prompting


Souravi Halder

JRA Language
Date: 07-03-2025
Topic: Web Development



Dr. Saroj Kumar Jha

SRA Language
Date: 14-02-2025
Topic: ILCI Corpora Guidelines for Annotation and Validation


Saurav Dudhate
Saurav Dudhate

B.Tech CSE (IIT Patna)
Date: 31-01-2025
Title: Direct Preference Optimization


Somenath Nag Choudhury

PhD Student, IIT Patna
Date: 07-01-2025
Topic: MAIRA-2: Grounded Radiology Report Generation



Abstract – Dr. Baban Gain
Large language models (LLMs) contain specialized “language-selective” units, identified using neuroscientific localization methods across 18 major models. Ablating these units leads to substantial drops in language performance, demonstrating their causal role. These units also align closely with human brain language-network activity. While some models show selective specialization for reasoning and social inference, the degree of specialization varies, revealing both parallels and divergences between LLM internal mechanisms and human neural organization.
Abstract – Dr. Saroj Kumar Jha
This work presents a multilingual, multi-task translation model for 36 Indian languages, trained on 10 billion parallel corpora. It integrates translation, grammar correction, error detection, and post-editing to improve cross-lingual communication. By leveraging diverse and synthetic datasets, the model enhances translation quality and supports low-resource language development across multiple domains.
Abstract – Dibyanayan Bandyopadhyay
This study introduces a hypothesis-testing framework to assess whether large language models (LLMs) possess genuine reasoning abilities or primarily depend on token bias. We go beyond evaluating LLMs on accuracy; rather, we aim to investigate their token bias in solving logical reasoning tasks. Specifically, we develop carefully controlled synthetic datasets, featuring conjunction fallacy and syllogistic problems. Our framework outlines a list of hypotheses where token biases are readily identifiable, with all null hypotheses assuming genuine reasoning capabilities of LLMs. The findings in this study suggest, with statistical guarantee, that most LLMs still struggle with logical reasoning. While they may perform well on classic problems, their success largely depends on recognizing superficial patterns with strong token bias, thereby raising concerns about their actual reasoning and generalization abilities.

Link: https://arxiv.org/abs/2406.11050
Abstract – Ayodhya Murmu
This study discusses the Santali language and the various scripts used to write it, highlighting issues related to spelling, phonetic representation, and the lack of a standardized form. It examines the challenges arising from the use of multiple scripts and emphasizes the need for a unified and accurate script for the Santali language.
Abstract – Anansa Roy
Pragmatics shapes language and meaning by examining how context, intention, and social interaction influence interpretation beyond literal words. It reveals how meaning is created through use, connecting linguistic form with real-world communication.
Abstract – Sandeep Kumar
Large Language Models often generate factually incorrect content, highlighting the need for reliable verification. FactLens is introduced as a benchmark for fine-grained fact-checking by decomposing complex claims into smaller sub-claims. It includes curated data, automated evaluators, and shows strong alignment with human assessments.

Link: View Presentation
Abstract – Somenath Nag Choudhury
MMed-RAG is a versatile retrieval-augmented generation (RAG) system designed to improve factual accuracy in medical vision-language models. It introduces domain-aware retrieval, adaptive context selection, and RAG-based preference fine-tuning. Experiments show significant improvements in medical VQA and report generation across multiple datasets.
Abstract – Dr. Vineet Kumar Lal Das
विद्यापति मशीन द्वारा मैथिली से हिंदी में अनूदित 100 वाक्यों का विश्लेषण में पाया गया है कि द्विभाषी शब्दकोस में पर्याप्त शब्द नहीं होने के कारण मशीनी-अनुवाद में त्रुटियाँ आ रही है और इन त्रुटियों को दूर करने के लिए सबसे पहले द्विभाषी शब्दकोस पर विशेष ध्यान देना होगा| इसके लिए मैथिली की ही नहीं बल्की हिंदी की भी अनेक शब्दकोशों को इसमे सम्मिलित करनी होगी|
Abstract – Souravi Halder
As generative AI tools like ChatGPT transform online communication, the boundaries between authentic and AI-generated content are increasingly blurred. This talk explores the importance of digital literacy, responsible sharing, and critical thinking to ensure the safe and ethical use of social media in the AI era. It highlights practical strategies to navigate misinformation, privacy risks, and algorithmic influence.
Abstract – Boynao Kshetrimayum
ReFT improves LLM reasoning by combining fine-tuning with reinforcement learning, enabling learning from multiple reasoning paths.
Abstract – Prajwal Vijay Kajare
The paper presents a method which combines Self-supervised learning and Reinforcement Learning to improve the instruction following ability of the reasoning model.
Abstract – Dr. Saloka Sengupta
In this talk the speaker discusses the article in which a new integration of linguistic anthropology and machine learning (ML) around convergent interests in understanding the underpinnings of language and making language technologies more socially responsible is proposed.
Abstract – Rochit Ranjan
A Systematic Analysis of Large Language Models into Assembly Code Obfuscation
Abstract – Dr. Saroj Kumar Jha
This work focuses on building linguistic resources for low-resource Purvanchal languages—Bhojpuri, Magahi, and Maithili—by collecting, cleaning, and annotating corpora from various domains. Basic statistical analyses were conducted at multiple linguistic levels, and comparisons were made with Hindi to assess similarities and complexities. POS tagging, chunking, and language identification tools were developed using BIS tagsets, along with bilingual dictionaries and synsets. The study also introduces a novel n-gram-based method for measuring language similarity. Overall, it lays foundational resources and baselines for future NLP research on these languages.
Abstract – Soham Bhattacharjee
This paper tackles free translation (conveying the sense but not word for word translation) using deep reasoning LLM and Chain of Thought.



Abstract – Dr. Pramod Singh
Dr. Pramod Singh shared his insights on the multifaceted nature of translation. He emphasized that translation plays a crucial role in the development of language, as it not only bridges linguistic gaps but also addresses inherent language-related challenges. Dr. Singh described translation as a long and evolving journey that requires a deep understanding of the nature of language. He highlighted that a translator must be able to faithfully convey the original author’s emotions and thoughts in another language. Therefore, for translation to be successful, it is essential to grasp the aesthetic and emotional nuances of the language.
Abstract – Baban Gain
This study finds that LLMs perform best at translation evaluation when using reference translations, while source sentences sometimes degrade their performance. Despite the source containing essential meaning for assessing fidelity, LLMs struggle to effectively utilize it—revealing a weakness in their cross-lingual reasoning.
Paper: Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation (Findings of ACL 2024)
Abstract – Nitika Gupta
This talk explores how case markers in Bengali define grammatical relationships, affecting syntax and semantics. It highlights their vital role in accurate linguistic analysis and translation between Indian languages and Bengali.
Abstract – Dibyanayan Bandyopadhyay
Can we generate a faithful Natural Language Explanation for classifiers in zero shot manner?
Can we express in Natural Language why a classifier predicted a class label with only access to its inner representation?
Paper: https://openreview.net/forum?id=X6VVK8pIzZ (ICLR 2025)
Abstract – Bigyan Ranjan Das
Hindi-Odia translation presents unique challenges due to differences in phonology, grammar, vocabulary, and cultural expressions. While both languages stem from the Indo-Aryan family, variations in verb conjugations, levels of politeness, and idiomatic usage often hinder direct translation. Script differences and regional influences further complicate the process. To overcome these nuances, translators must apply contextual understanding, cultural sensitivity, and adaptive language strategies.
Abstract – Somenath Nag Choudhury
MITER presents a two-stage pretraining framework that first leverages uni-modal objectives to independently adapt image and text encoders to the medical domain, then employs a cross-modal contrastive loss to jointly align and fuse their representations—facilitating stronger vision-language learning. It introduces a dynamic hard- negative sampling strategy based on alignment and uniformity principles, improving model robustness and representation discrimination. As a result, MITER significantly outperforms stateoftheart methods across four downstream medical tasks, image-report retrieval, multi- label classification, visual question answering, and report generation.
Abstract – Ayodhya Murmu
The topic discusses the key challenges encountered during Hindi-to-Santali translation validation, focusing on issues like punctuation mismatches, improper transliteration of technical terms, and incorrect handling of named entities such as names of people or organizations. It also addresses problems related to verb tense, case markers, and lexical choices that affect sentence clarity. Additional concerns include errors in singular/plural usage, omissions or additions during translation, and direct translations from English that may not be culturally or linguistically suitable. The talk offers corrective measures to ensure accurate, consistent, and context-appropriate validation.
Abstract – Priyanshu Priya
This talk presents key highlights and insights from the AAAI-25 Conference, capturing breakthrough research trends, impactful sessions, and emerging directions in artificial intelligence.
Abstract – Anansa Roy
Bengali postpositions exhibit a dual nature, functioning both as independent lexical items with case markers and as inflectional elements denoting spatial relations. Their distribution is uneven across grammatical cases, with certain cases supporting a richer set of postpositions than others.
Abstract – Dr. Vineet Kumar Lal Das
भाषा संबंधी त्रुटियाँ,वाक्य संरचना से संबंधित त्रुटियाँ, शब्द छुटने संबंधी त्रुटियाँ, प्रेरणार्थक क्रिया को स्वतः क्रिया में व्यक्त करने संबंधी त्रुटियाँ, विभक्ति लोप संबंधी त्रुटियां, सहायक क्रिया संबंधी त्रुटियां, क्रिया संबंधी त्रुटियां, आदर/अनादर शब्द संबंधी त्रुटियां, टाइपिंग संबंधी त्रुटियां, वर्तनी संबंधी त्रुटियां आदि के बारे में जानकारी के साथ उसके निदान पर विस्तृत चर्चा की गई|
Abstract – Abid Hossain
In this talk, I highlighted how large language models exhibit a strong English bias that hinders their performance on low-resource languages and introduced Cross-Lingual-Thought Prompting (XLT), a language-agnostic prompt template that leverages English chain-of-thought reasoning to boost accuracy across 27 diverse languages. The experiments show that XLT yields substantial gains in reasoning, inference, and generation tasks without updating model parameters.
Abstract – Souravi Halder
Web development is the process of creating and maintaining websites and web applications that run on the internet. It encompasses several disciplines, including front-end development, back-end development, and full-stack development. Front-end development focuses on the user interface and user experience, using technologies such as HTML, CSS, and JavaScript. Back-end development handles server-side logic, database interactions, and application functionality, often using languages like Python, PHP, Java, or Node.js. Full-stack development combines both front-end and back-end skills. Web development also involves understanding protocols like HTTP, managing data through databases (e.g., MySQL, MongoDB), and deploying sites using web servers and cloud platforms. This abstract introduces the foundational tools, concepts, and practices that are essential for building modern, responsive, and interactive websites.
Abstract – Saroj Jha
This guideline outlines the annotation of Hindi text using the ILPOSTS (Indian Language POS Tag Set), a hierarchical and flexible tagset framework developed by Microsoft Research India. ILPOSTS enables cross-linguistic compatibility and allows derivation of language-specific tagsets for diverse Indian languages. It supports morphosyntactic annotation at varying levels of granularity to suit specific linguistic and NLP tasks.
Abstract – Saurav Dudhate
Presented a paper on Direct Preference Optimization (DPO), a novel approach to fine-tuning large language models using human preferences without complex reinforcement learning. Unlike traditional RLHF methods, DPO uses a simple classification loss, making the process more stable, efficient, and easier to implement. It outperforms or matches existing methods in tasks like sentiment control, summarization, and dialogue generation.
Abstract – Somenath Nag Choudhury
The paper presents MAIRA2, a multimodal model that generates radiology reports from chest Xrays with optional spatial grounding, associating each finding with boundingbox annotations to enhance traceability and interpretability. By leveraging comprehensive inputs—including frontal/lateral current images, prior studies and reports, plus clinical context—MAIRA2 outperforms prior models on report quality and reduces hallucinations, setting a new stateoftheart on MIMICCXR. The authors introduce RadFact, an LLMbased evaluation framework that quantifies factual correctness and spatial localization accuracy at the sentence level, enabling rigorous assessment of the new grounded reporting task.
© Copyright AI-NLP-ML Group, Department of CSE, IIT Patna. All Rights Reserved