Patterns and Trends in Accepted NeurIPS Papers

The biggest conference in machine learning is less than a week away. The thirty-seventh conference on Neural Information Processing Systems, or NeurIPS as it is colloquially called, will take place from December 10th-16th, 2023 in New Orleans, LA. Thousands of machine learning researchers from around the world will converge in the Big Easy to eat beignets and discuss everything from Bayesian optimization to adversarial attacks and in-context learning.

With 3,584 accepted papers, 14 tutorials, and 58 workshops, it’s nigh impossible to absorb all that next week has in store. Lucky for you, we’ve scraped, scoured, and synthesized the data to bring you the most important insights from NeurIPS 2023!

In this post, we’ll share our essential findings from an in-depth analysis of NeurIPS 2023 papers and historical NeurIPS data. Follow-up posts will highlight and summarize select papers you won’t want to miss.

The code for extracting these insights, along with the source data, and a complete listing of posters by poster session, check out our NeurIPS 2023 Repo on GitHub — your one-stop-shop for everything NeurIPS 2023!

Our insights are divided into three categories: authors, titles, and abstracts.

The Data

Historical Data

For historical titles and authors of NeurIPS papers, data was scraped from https://proceedings.neurips.cc/paper/. There is a page for each year up until 2023 — for instance the data for 2016 can be found at https://proceedings.neurips.cc/paper/2016. Author lists were split at commas.

NeurIPS 2023 Data

For 2023 data, the process was more challenging, both because the page https://proceedings.neurips.cc/paper/2023 does not exist, and because I also wanted abstracts. Additionally, I wanted to know which papers were orals or spotlights.

NeurIPS makes the data collection process rather frustrating. The official interface for browsing papers only shows 400 projects at a time, and does so in a non-paginated fashion. This means that you can’t scroll past 400, or click to the “next” page of results. What’s more, while the paper abstracts are indeed visible when you select the “details” tab, there is no way to programmatically specify this via URL. In other words, there isn’t a syntax like `https://neurips.cc/virtual/2023/papers.html?mode=detail` to indicate that you want to get the abstracts.

Fortunately, if it’s visible in the browser, it’s at least in theory scrapable. And when there’s a will, there’s a way! So we did the dirty work of scraping, cleaning, and validating this data so you don’t have to. You can find the raw 2023 data here.

Abstracts

Keyword Frequencies

The 2023 abstracts contain a treasure trove of rich data about the current state of machine learning. To glean insights from these abstracts, we looked at the relative frequencies of certain groups of words. For instance, words directly related to classification (e.g. “classification” and “classify”) show up in 389 abstracts — roughly 10%, whereas words related to localization (“localization” and “localize”) only appear in 55 abstracts.

Some of the results speak to the staying power of the classics: “Bayes” (174) and “Gaussian” (195) are still present in around 5% of all abstracts. Some results resurface long standing features of the machine learning research landscape: models (2361) still have the edge over data (2133) in terms of the number of projects, despite the ongoing shift towards data-centric AI.

Another set of terms highlights recent trends in the space:

“generative”: 508
“transformer”: 271
“agent”: 280
“zero-shot” (134) is more popular than “few-shot” (77)

However, it’s worth noting that in spite of all the hype around do-it-all foundation models, “foundation” (111) is far outpaced by “efficient” (963)!

Most Common Modalities

The popular sentiment of late has been that the future is multimodal. According to the data, it’s a bit of a mixed bag. Variations on the word “multimodal” (34) occur sparingly, and words associated with multimodal tasks “text-to-image” (73), “captioning” (27) and “visual question answering” (24) are not much more common.

A zero-shot modality classification using CLIP would seem to indicate that “multimodal” is more common than any single modality, but digging into specific abstracts, a lot of these classifications are called into question. For details — and to investigate yourself — check out the analysis Jupyter notebook.

Abstracts at the Edge

To top off our analysis of abstracts, we looked at the lengths of abstracts, measured by the number of words. The overall distribution follows a familiar normal distribution.

Pushing the limits on abstract length, the shortest abstract weighs in at 29 words. Interestingly, the culprit is a Spotlight poster, Improved Convergence in High Probability of Clipped Gradient Methods with Heavy Tailed Noise. On the opposite extreme, [Re] On the Reproducibility of FairCal: Fairness Calibration for Face Verification has an abstract 12 times as long at 373 words.

Caption: Histogram of abstract lengths for all 3584 NeurIPS 2023 papers, which roughly fall into a Gaussian distribution.

Authors

Number of Authors per Paper Increases

When looking at the author lists for NeurIPS 2023 papers, one thing immediately jumps off the page: the number of authors per paper. In 2023, the average paper had 4.98 authors, as compared to 4.66 authors in 2022. Extending the analysis over the past 10 years, the trendline is clear. The number of authors per paper has monotonically increased, and these increases have compounded into a >50% increase in the number of authors per paper over the decade.

You might suspect that these increases are due to the onset of a few large collaborations with massive author lists skewing the data. Where I come from, in physics, papers can have thousands of authors. It turns out that this is not the case.
The NeurIPS 2023 paper with the most authors was ClimSim: A large multi-scale dataset for hybrid physics-ML climate emulation— a physics-ML crossover no less! More importantly, the entire distribution of author counts shifts right from 2022 to 2023.

Number of Unique Authors Explodes

As we just saw, the number of authors per paper has increased substantially over the past decade. At the same time, the number of submissions accepted to NeurIPS has even more dramatically ballooned, from 411 in 2014 to 3584 in 2023. When you superimpose these two effects, you get the number of unique authors increasing from 1064 in 2014 to 13012 in 2023. Are there 13x the number of ML researchers in 2023 as in 2014? This is the question!

Titles

Titles Grow Length

As subfields become more specialized and the low-hanging fruit and far-reaching generic titles are gradually gobbled up, it’s no surprise to see titles growing longer. As an example, the title introducing an offshoot will often reference the original.

Since a rough equilibrium from 2014-2017, the average number of words per title has consistently increased, inching its way from 7.39 in 2017 to 8.72.

Caption: Average number of words in NeurIPS paper titles over the last decade.

Caption: Normalized distributions of title length in words for NeurIPS 2022 and 2023 accepted papers.

Everyone Wants an ACRONYM

Another trend which has been growing in popularity is the introduction (read creation) of acronyms in machine learning paper titles. With the amount of activity in the space exponentially growing, authors are doing everything they can to stand out. In the past, acronyms were primarily created after the fact, when a model or method was deemed seminal to the field. Nowadays, people love to create the acronyms themselves!
For NeurIPS papers, the fad isn’t as omnipresent as we saw for CVPR 2023 papers. In 2023, 22% of papers introduced their own acronym, up from 18% in 2022. To see firsthand how rare this practice used to be, I encourage you to look at the titles of papers from NeurIPS 2014!

Conclusion

These analyses just scratched the surface of the insights nestled within the NeurIPS 2023 and historical metadata. The primary contribution associated with this post is not the insights themselves, but the cleaning, curation, and open distribution of the underlying data. The data, along with the code to perform these analyses and so much more, is available at the Awesome NeurIPS 2023 GitHub repository. I strongly encourage you to download or fork the repo and use it as a starting point for your own exploration of the data!

If you enjoyed this post and want to geek out on data-centric AI and the future of the field, reach out to me on LinkedIn and come by our booth at the conference — Voxel51 is a platinum sponsor!

Talk to a computer vision expert