ICLR 2026 – Institutional Affiliations Dataset and Analysis

TL;DR

Researchers have developed an end-to-end pipeline to extract and analyze institutional affiliations from 5,356 ICLR 2026 accepted papers. The resulting dataset includes normalized affiliations and visualizations, offering insights into the landscape of AI research. The project aims to improve accuracy over author profiles and facilitate institutional impact analysis.

A new pipeline has transformed the analysis of ICLR 2026 accepted papers by generating a comprehensive, PDF-derived institutional affiliation dataset, which is now available for research and visualization purposes. This development offers a more accurate picture of the institutions shaping AI research, moving beyond unreliable author profiles.

The pipeline processes 5,356 accepted papers from ICLR 2026, extracting author affiliations directly from PDF title blocks to avoid issues like profile drift, where current jobs are incorrectly attributed to past papers. It normalizes institution names via approximately 250 rules, ensuring consistency across the dataset. The resulting data includes institution counts, country and region classifications, and detailed author affiliations, stored in multiple CSV formats for different analysis approaches.

Key outputs include a publication-level dataset, a ranked list of institutions by unique affiliation counts, and visualizations such as treemaps illustrating the research landscape. The project also compares different counting methods—per paper, first-author only, and fractional—to assess robustness and identify potential artifacts. The pipeline’s accuracy is approximately 96%, with fallback to author profile data for the remaining 4% where PDF parsing fails.

Why It Matters

This development matters because it provides a more precise and reliable view of institutional contributions to AI research at ICLR 2026. It enables stakeholders—researchers, institutions, and policymakers—to better understand research trends, collaboration patterns, and the influence of industry versus academia. The dataset enhances transparency and supports more accurate bibliometric analyses, which can influence funding, partnerships, and strategic decisions.

Express Rip Free CD Ripper Software – Extract Audio in Perfect Digital Quality [PC Download]

Perfect quality CD digital audio extraction (ripping)

As an affiliate, we earn on qualifying purchases.

Background

Previous analyses relied heavily on author profiles from platforms like OpenReview, which are prone to inaccuracies due to profile drift. The new pipeline addresses this by extracting affiliations directly from PDF documents, which are the official source. This approach aligns with ongoing efforts in the research community to improve data quality and transparency in bibliometric studies. The pipeline builds on prior work in PDF parsing and normalization, applying these techniques specifically to ICLR 2026, one of the major AI conferences.

“This pipeline provides a more accurate, PDF-derived institutional affiliation dataset, avoiding the common pitfalls of author profile drift and enabling detailed analysis of research trends.”

— Dmytro Lopushanskyy, project lead

“The new affiliation dataset offers valuable insights into the global distribution of AI research and the evolving landscape of institutional contributions.”

— ICLR 2026 organizing committee

Amazon

institutional affiliation analysis tools

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It remains unclear how the dataset will be adopted by the broader research community or integrated into existing bibliometric tools. Additionally, the accuracy of PDF parsing for non-standard or poorly formatted papers may vary, and the long-term stability of the normalization rules has yet to be tested across future conferences.

Bibliometrics – An Essential Methodological Tool for Research Projects

As an affiliate, we earn on qualifying purchases.

What’s Next

Next steps include expanding the dataset to cover subsequent conferences, refining parsing algorithms for even higher accuracy, and encouraging community use to validate and improve the dataset. Researchers may also explore applying similar pipelines to other conferences or journals to build a comprehensive view of AI research trends.

Better Data Visualizations: A Guide for Scholars, Researchers, and Wonks

As an affiliate, we earn on qualifying purchases.

Key Questions

How does this dataset improve over previous author profile-based analyses?

The dataset extracts affiliations directly from PDF title blocks, reducing errors caused by profile drift and providing a more accurate representation of institutional contributions.

Can this pipeline be used for other conferences or journals?

Yes, the pipeline is designed to be adaptable, and with some modifications, it can process PDFs from other conferences or journals to generate similar affiliation datasets.

What are the main limitations of this approach?

The primary limitations include potential parsing errors with non-standard papers and the need for ongoing normalization rule updates to maintain accuracy across diverse formats.

How can researchers access and use this dataset?

The dataset is publicly available via the project’s GitHub repository and can be integrated into bibliometric analyses or visualizations to better understand research trends.

ICLR 2026 – Institutional Affiliations Dataset and Analysis

Up next

Toyota plans to build $2bn Texas assembly plant

Author

Best CAD Papers Team

Why It Matters

Express Rip Free CD Ripper Software – Extract Audio in Perfect Digital Quality [PC Download]

Background

institutional affiliation analysis tools

What Remains Unclear

Bibliometrics – An Essential Methodological Tool for Research Projects

What’s Next

Better Data Visualizations: A Guide for Scholars, Researchers, and Wonks

Key Questions

How does this dataset improve over previous author profile-based analyses?

Can this pipeline be used for other conferences or journals?

What are the main limitations of this approach?

How can researchers access and use this dataset?

iPhone 18 Pro Max Vs Google Pixel 11 Pro XL: Main Differences To Expect

Show HN: Streambed – Stream Postgres to Iceberg on S3, Supports Postgres Wire

China’s Best Printer Factories: The Unbelievable Quality That’s Going Viral

Mercurial, 20 years and counting: how are we still alive and kicking? [video]

8 Best Drafting Table in 2026 — Find Your Perfect Workspace

AI Changelog Digest For Open-source Maintainers

Stretch Wrapping and Load Stability: The Basics That Matter

How Recycled Fluting Affects Corrugated Strength and Feel

ICLR 2026 – Institutional Affiliations Dataset and Analysis

Up next

Author

Best CAD Papers Team

Why It Matters

Express Rip Free CD Ripper Software – Extract Audio in Perfect Digital Quality [PC Download]

Background

institutional affiliation analysis tools

What Remains Unclear

Bibliometrics – An Essential Methodological Tool for Research Projects

What’s Next

Better Data Visualizations: A Guide for Scholars, Researchers, and Wonks

Key Questions

How does this dataset improve over previous author profile-based analyses?

Can this pipeline be used for other conferences or journals?

What are the main limitations of this approach?

How can researchers access and use this dataset?

You May Also Like