Up: Home
AI for the Working Mathematician
Summary. We suggest ways that AI can be useful to the working mathematician. Routine tasks are emphasized, but examples also demonstrate recently-developed AIs’ increasing usefulness for actual research mathematics. We finish with a brief overview of the AI product landscape, and a list of tips for effective use.
What can AI do for me? AI1 speeds up many mundane tasks, especially those that are laborious, but whose correctness is easy to verify:
LaTeX help (e.g. composing TikZ pictures, debugging a file won’t compile),
coding2,
hunting typos, use of notation before definition, and similar mistakes in papers that one is writing or refereeing,
transcription (e.g. producing TeX from a picture of hand-written text).
AI is also helpful for finding and understanding information:
recalling facts, arguments, and examples,
“fuzzy” questions (“Is there a name for blah?”, “Is Thing A related to Thing B?”, “is there an analogue for X in setting Y?”),
quick overviews of unfamiliar topics, and
paper-reading partner (upload a paper you’re reading, and ask the AI questions about things you don’t understand or haven’t heard of while reading it).
Like humans, AIs often say things that are not perfectly correct, but point in the right general direction. Using them is straightforward: open a browser tab to your AI of choice (I most often use Claude), and talk to it much as you would a person. For some tasks, it is also helpful to upload files, such as PDFs of relevant papers, or to provide a brief explanation of your higher-level goal. This contextual information allows the AI to make better guesses about what you want it to do.
AIs have historically been bad at tasks requiring careful, exact reasoning or multiple layers of abstraction. New “chain-of-thought” or “thinking” models are beginning to overcome this weakness, making AIs more useful for directly discussing novel math, e.g. for generating small examples/counterexamples.
Show me the goods! We consider some concrete examples and anecdotes, remarking on salient features. Each example can be clicked to expand/collapse details.
Example. TikZ code for the Hasse diagram of a partition poset can be generated by brief
interaction with ChatGPT. This illustrates a routine interaction with an AI on a task well within
its abilities.
We first ask:
Please output tikz code for the poset of partitions contained in (3,3), ordered by containment.
After a couple minutes of thought, ChatGPT outputs TikZ code, which we compile to the following image.
The picture is correct, but has some ugly visual features, so we ask ChatGPT to fix them.
This looks great. Can you space out the nodes a little more, remove the outlines on the nodes, and draw a small picture of the young diagram for each partition instead of using labels like (3,3), (3,2), etc?
The updated code compiles to:
The new picture is uncentered, but this is again a quick fix!
Can you now make it so that each row with 2 young diagrams is centered, and add in an empty set sign to represent the empty young diagram at the bottom?
We obtain the final image:
The model used in this example was o3-mini-high.
Example. In a recent project, the class of matroids with no U2,4, K4, or K2,3 minor appeared.
We wondered if this class of matroids had a name. We learned that this is the class of matroids of
“outerplanar graphs” by talking to ChatGPT. In this conversation, the AI makes some
wrong assertions (e.g. that K2,3 is not series-parallel). Correcting the model and feeding
it lots of miscellaneous facts about the class eventually allows the AI to identify it.
The model used in our original conversation (linked above) is ChatGPT-4o, a weak, non-thinking model that is the default for ChatGPT’s free tier.
Out of curiosity, we later ran the same query on a thinking model (Claude 3.7 Sonnet); a transcript of the conversation is below. The thinking model makes the same mistake, first guessing incorrectly that the class we’re dealing with is series-parallel matroids. After being corrected, the thinking model requires much less information than 4o before it suggests looking at outerplanar graphs. In fact, it is able to notice on its own that outerplanar graphs are precisely what we are looking for!
The model only observes this after being asked to recall the definition of outerplanar graphs, which it had previously suggested in a tone of relatively low confidence. Even thinking AIs do not always handle multi-layered abstractions well, so it can be helpful to manually prompt them to unroll the layers.
Me: Is there a name for the class of matroids with exluded minors U2,4, K4, and K2,3?
AI: I don’t believe there is a standard name in matroid theory for the specific class defined by exactly these three excluded minors (U2,4, K4, and K2,3).
The exclusion of K4 and K2,3 as minors is related to series-parallel networks, but the additional exclusion of U2,4 makes this a more specialized class that doesn’t correspond to one of the commonly named matroid classes in the literature.
Me: Does it help to know that the class is closed under parallel connection?
AI: Yes, that’s helpful information. The class you’re describing is most likely the series-parallel matroids.
Series-parallel matroids are precisely those that can be constructed from single-element matroids using series and parallel connections. They can be characterized by their excluded minors:
Binary series-parallel matroids exclude K4 and K2,3 as minors
Since your class also excludes U2,4, it’s specifically referring to binary series-parallel matroids
This class is indeed closed under parallel connection by definition, which aligns with the additional information you provided.
Series-parallel matroids also correspond to the graphic matroids of series-parallel graphs, and they have applications in network reliability and electrical circuit theory.
Me: K2,3 is a series-parallel matroid though
AI: You’re absolutely correct - I made an error. K2,3 is indeed a series-parallel graph/matroid, not an excluded minor for that class.
The class of matroids excluding U2,4, K4, and K2,3 as minors is therefore:
A subset of binary matroids (due to excluding U2,4)
A subset of series-parallel matroids (due to excluding K4)
Further restricted by also excluding K2,3
This appears to be a more specialized class than just series-parallel matroids. It could be related to outerplanar matroids or some other restricted class, but I don’t immediately recognize a standard name for matroids with exactly these three excluded minors.
The closure under parallel connection is still an important property but doesn’t immediately reveal the standard name for this specific class, if one exists in the literature.
Me: What are outerplanar matroids?
AI: Outerplanar matroids are the graphic matroids of outerplanar graphs.
An outerplanar graph is a graph that can be drawn in the plane with all vertices on the exterior face (the outer face) and no edges crossing. The key property of outerplanar graphs is that they are precisely the graphs that exclude K4 and K2,3 as minors.
Therefore, outerplanar matroids:
Exclude K4 as a minor (since outerplanar graphs do)
Exclude K2,3 as a minor (since outerplanar graphs do)
Exclude U2,4 as a minor (because all graphic matroids are binary, and binary matroids exclude U2,4)
This matches exactly the excluded minor characterization you mentioned! So the class of matroids with excluded minors U2,4, K4, and K2,3 is precisely the class of outerplanar matroids.
This class is indeed closed under parallel connection, as adding parallel elements to an outerplanar graph preserves its outerplanarity.
The next few examples show new “chain-of-thought” or “thinking” models performing tasks that would have been beyond the reach of AIs from less than a year ago.
Example. After reading our paper, ChatGPT produced a nontrivial example of the
“Bergman fan of a polymatroid”, a complicated object that we defined in the paper. This is a
complex task requiring compilation of information from multiple locations in the paper, and
careful calculation using that information. This task is near to the limit of current AI capabilties;
performance here may not be so reliable.
The early part of the chat is comprised of failed attempts to get ChatGPT to access and read our paper on the web. This did not work (through a fault of the interface rather than the model), so we finally uploaded the PDF, along with the prompt
Read this PDF. Come up with an example for definition 1.7 of the Bergman fan.
After 25 seconds, the model responded with a worked-out example of a polymatroid and an explanation of how to calculate its Bergman fan. This was correct, but we wanted a more explicit description of the Bergman fan, so we followed up with
Enumerate the cones of the Bergman fan explicitly in this example.
The AI did this correctly after 98 seconds of thought.
The generated example was the polymatroid P on E = {1,2} with rank function ρ(∅) = 0, ρ(1) = 1, ρ(2) = 2, and ρ(12) = 3. To calculate the Bergman fan of P, the AI needed to first understand another definition from the paper, the “multisymmetric lift” of P. The AI described the multisymmetric lift, then listed the ten cones of the Bergman fan of P.
The model used in this example was OpenAI’s o1.
Example. A small error was recently discovered in a different paper of mine3,
having crept past three authors, plus referees and readers. I asked Claude whether the offending
statement was true. Claude first said (incorrectly) that the statement was likely true,
giving a heuristic reason similar to our original intuition. With a little prodding to
think harder, Claude correctly concluded that the statement was false and provided a
small counterexample. It finished by correctly suggesting that the condition of the
false statement’s if-and-only-if might be sufficient but not necessary. Prominent in this
example is the usefulness of asking an AI to think more carefully about a past response.
A transcript of our short interaction is below.
Me: A math paper claims that: If M is a matroid on ground set E and A1,…,Am is a collection of subsets of E, then there is an independent transversal I ⊂ E∖e of A1,…,Am for any element e ∈ E if and only if for all nonempty J ⊂ [m], rk(∪j∈JAj) ≥|J| + 1. Do you think this statement is true?
AI: After analyzing this statement, I believe it is true.
The condition rk(∪j∈JAj) ≥|J| + 1 for all nonempty J ⊆ [m] ensures there’s enough “richness” in the collection of sets to find an independent transversal avoiding any single element.
We can understand this condition intuitively: for any subcollection of sets indexed by J, we need their union to have rank at least |J| + 1. This extra “+1” in the rank requirement gives us the flexibility to avoid any single element e while still finding an independent set with exactly one element from each Ai.
This is a natural extension of Hall’s marriage theorem to matroids with the additional constraint of avoiding a specific element.
Me: Are you sure?
AI: I need to reconsider my answer. After further analysis, I don’t think the statement is necessarily true.
Let me provide a potential counterexample:
Consider a uniform matroid U2,3 on ground set E = {1,2,3}, where any 2 elements form an independent set.
Let A1 = {1,2} and A2 = {2,3}
For J = {1,2}, we have:
rk(A1 ∪ A2) = rk({1,2,3}) = 2
|J| + 1 = 3
So rk(∪j∈JAj) < |J| + 1, failing the condition.
Yet we can find independent transversals avoiding any element:
To avoid e = 1: Choose 2 from A1 and 3 from A2
To avoid e = 2: Choose 1 from A1 and 3 from A2
To avoid e = 3: Choose 1 from A1 and 2 from A2
This suggests the condition is sufficient but not necessary.
This example was generated using API calls to Claude 3.7 Sonnet. While our interaction happened in a mix of unicode and ascii math, I asked Claude to transcribe it in latex, producing what you see above.
Example. ChatGPT was asked to “calculate the characteristic polynomial of the
Hesse configuration”. The Hesse configuration is comprised of the twelve lines in the
affine plane over the field with three elements. The AI inferred that the characteristic
polynomial of the matroid of this configuration was desired, and correctly calculated it.
We do not have a link or transcript of this conversation; the model used was o1-pro.
How do I try it? The leading AI providers are Anthropic (who produce Claude), OpenAI (ChatGPT), and Google (Gemini).
To try out an AI produced by one of these companies, you will need to make an account (an existing Google account is sufficient for Gemini). Each company has several different AI models to choose from.
The least powerful models can be used free of charge. These models handle simple tasks with aplomb, but may fall down on questions that require extended or careful reasoning, or synthesis of lots of information. They also may come with daily usage caps.
With a subscription (typically about 20 USD/month), users can access “thinking” models, which write out intermediate chains of thought before replying, much as you might use scratch paper. Thinking models are slower, but more robust. They are able to handle more complex and exacting tasks than non-thinking models, are much better at logical reasoning, and can to an extent notice and correct their own mistakes. Leading examples in this category are Claude 3.7 Sonnet and ChatGPT o3-mini-high4.
OpenAI also offers a “Pro” subscription (200 USD/month), whose main draw is access to o1-pro, a thinking model that is able to think for an especially long time.
Tips and pointers.
Models have different “personalities” and are differently inclined to think for a long time. OpenAI’s o1 and o1-pro are strongly inclined to think, while o3-mini-high and Claude 3.7 Sonnet are moderately inclined to think. If you want a model to think more, it is helpful to say this.
Don’t take AI responses as final: asking an AI to think more carefully about or suggest improvements to a response it has already given often yields better-quality thoughts.
Keep in mind what information is in an AI’s “context”, i.e. the set of information that the AI will take into account when responding. Some context is provided explicitly (e.g. uploaded files), but other parts are provided implicitly (e.g. chat history with the model). Depending on the provider, the context for your discussions with an AI may include all past chats you’ve ever had.
AI complements rather than replaces search engines, with better performance on fuzzier, more open-ended questions. Context is a major difference between the two, intimating to the AI what you’re looking for in a way that isn’t possible with a search engine.
Some AIs have a “knowledge cutoff”, a date past which they don’t know about current events because their training data all came from before that date. This limitation is sometimes mitigated by giving AIs the ability to browse the web.
Despite their typeset, authoritative appearance, you should think of AI responses as closer to what someone might say after thinking about your words for just a minute or so. Simple mistakes are possible, references are sometimes imagined, and the response may not reflect deep and careful thought.
AIs are good at inferring what you want from small examples. Providing a small example and asking a model to “do it like this” is often easier than writing out all the details of what you want.
Try not to overwhelm the AI. Where possible, limit the number of layers of abstraction it needs to unwind, as you would for a person. When layers can’t be avoided, try helping the AI by manually prompting it to unroll them (as in the excluded minors example above).
Acknowledgements. This note was improved by conversations, comments, and examples contributed by Anna, Artane, Asvin, Colin, Daoji, June, Liban, and Trajan. Thank you all.
Exhortations. Please feel free to let me know if I’ve made a mistake or irksome omission. I am also happy to hear of new applications (even small ones: 10 minutes saved six times is a whole hour), anecdotes, or random musings on this topic.
Finally, if you hold appropriate office (e.g. OpenAI/Anthropic/Google engineer), and want to both accelerate algebraic geometry research and earn eternal gratitude from the community, please make your models write better Macaulay2 :)
Last update. March 21, 2025
1For the purposes of this essay, “AI” will refer to LLM-based chatbots such as ChatGPT, Claude, and Gemini, as this is what the term is now identified with in the popular discourse.
2Sadly, no model that I know of writes good Macaulay2, perhaps due to lack of training data. Lots of mathematicians would be very happy if this changed!
3Matt Larson noticed that Proposition 5.2.3 of this paper is incorrect as stated. An fix has been posted.
4If the “Last update” date of this page is more than 3-ish months ago, it is likely that these are not the leading models anymore.