the learning/deduction feedback loop runs across problems and inside problems
The more problems/branches you solve/close, the more solutions you can learn from
The more solutions you can learn from, the more problems you solve
Not just model avoidance, also ``dangerous pattern'' avoidance
still quite a prototype (no CASC)
already about 20-time proof search shortening on MPTP Challenge compared to leanCoP (see the paper)
MaLeCoP Architecture
MaLeCoP
Some lessons learned
Don't push "expensive AI" where "normal AI" is better, do the obvious first:
statistical ML vs. symbolic (ILP, etc): needs to be carefully merged: statistical substitution/subsumption trees, ATPs testing generalizations, etc.
high-level combination of standard ATPs and learning vs. low-level new "cool" (often non-performing) systems (lots of work on MaLeCoP to make it efficient)
make it useful (Sledgehammer, MizAR, HOL(y)Hammer) and have rigorous benchmarks (CASC LTB, etc.)
Some lessons learned
one's own sense of "AI coolness" may be totally wrong on the large scale
large-scale evaluations will show what is good and what to do next (analyze data often!)
simple ML methods applied at the right place can be very powerful (recently e.g. BliStr)
the formal domain is very fragile and takes a lot of work, but also very rewarding:
new ATP proofs cannot be trusted/learned from if the translation to ATPs is unsound (funny story with Isabelle)
once the translation is correct, new AI/ATP proofs really emerge
Challenges in large formal KBs
selection of knowledge (for ATPs and humans)
useful translations between logics, useful ATP calculi and decision procedures
training internal guidance of ATPs, SMTs, automation inside ITPs
developing the corpora: lemmas, concepts, conjectures, (counter)examples
transfer ideas from SAT to malecop and E
Alan Frisch claims that using dynamic features the saT/unsat could be guessed with 75% success on random sat problems
dynamic features: random hill-climbing to get as much clauses true, etc
Challenges in large formal KBs
refactoring of long ATP proofs for human consumption (50k-long proof by David Stanovsky & Waldmeister)
using strong AI/ATP to help disambiguation/understanding of natural-language math
emulating the layer on which mathematicians think (related to natural language)
what will it take to prove Brouwer or Jordan fully automatically?
Stephan Schulz, Geoff Sutcliffe, Larry Paulson, Tobias Nipkow, Jasmin Blanchette, John Harrison, Tom Hales, Andrzej Trybulec, Piotr Rudnicki, Ingo Dahn, Bill McCune, Adam Pease, Andrej Voronkov, Kostya Korovin, ...