Stuck on a simple proof about regular expressions
Stuck on a simple proof about regular expressions
I'm trying to formalize some properties on regular expressions (REs) using Coq. But, I've got some troubles to prove a rather simple property:
For all strings s, if s is in the language of (epsilon)* RE, then s =
"", where epsilon and * denotes the empty string RE and Kleene star
operation.
This seems to be an obvious application of induction / inversion tactics, but I couldn't make it work.
The minimal working code with the problematic lemma is in the following gist.
Any tip on how should I proceed will be appreciated.
EDIT:
One of my tries was something like:
Lemma star_lemma : forall s, s <<- (#1 ^*) -> s = "".
Proof.
intros s H.
inverts* H.
inverts* H2.
inverts* H1.
inverts* H1.
inverts* H2.
simpl in *.
-- stuck here
that leave me with the following goal:
s' : string
H4 : s' <<- (#1 ^*)
============================
s' = ""
At least to me, it appears that using induction would finish the proof, since I could use H4 in induction hypothesis to finish the proof, but when I start the proof using
induction H
instead of
inverts* H
I got some (at least for me) senseless goals. In Idris / Agda, such proof just follows by pattern matching and recursion over the structure of s <<- (#1 ^*). My point is how to do such recursion in Coq.
I started my try with
induction H
and could proof each subgoal with subst; reflexivity
except for one, the case inChr
. I get stuck in the same situation when I try induction s
. Is it possible that the representation is not as intended?– ichistmeinname
Nov 25 '16 at 9:17
induction H
subst; reflexivity
inChr
induction s
I'm thinking your theorem is a bit too specialized. Coq doesn't handle well induction on term that are not "all variables". Here your ` s <<- (#1 ^*)` should be replaced with ` s <<- r -> r = (#1 ^*)` so that recursion on
r
is a variable, not an instantiated term. You'll have lots of dummy cases but the induction hypothesis should be easier to use. You might also have a look at dependent induction
.– Vinz
Nov 25 '16 at 12:01
r
dependent induction
@Vinz The problem is that in the induction hypothesis
r
becomes #1 :+: ((#1 @ (#1 ^*))
so you have as a precondition the spurious #1 :+: ((#1 @ (#1 ^*)) = (#1 ^*)
which means that it cannot be used anymore.– gallais
Nov 25 '16 at 13:18
r
#1 :+: ((#1 @ (#1 ^*))
#1 :+: ((#1 @ (#1 ^*)) = (#1 ^*)
Also
dependent induction
will introduce a dependency on the JMeq
axiom.– Anton Trunov
Nov 25 '16 at 16:36
dependent induction
JMeq
3 Answers
3
Here is one possible solution of the original problem:
Lemma star_lemma : forall s,
s <<- (#1 ^*) -> s = "".
Proof.
refine (fix star_lemma s prf {struct prf} : s = "" := _).
inversion_clear prf; subst.
inversion_clear H; subst.
- now inversion H0.
- inversion_clear H0; subst. inversion_clear H; subst.
rewrite (star_lemma s' H1).
reflexivity.
Qed.
The main idea is to introduce a term in the context which will resemble the recursive call in a typical Idris proof. The approaches with remember
and dependent induction
don't work well (without modifications of in_regex
) because they introduce impossible to satisfy equations as induction hypotheses' premises.
remember
dependent induction
in_regex
Note: it can take a while to check this lemma (around 40 seconds on my machine under Coq 8.5pl3). I think it's due to the fact that the inversion
tactic tends to generate big proof terms.
inversion
(1) It's a pleasure to prove the lemma in Idris (comparing to Coq). (2) I'm wondering if there is a faster method (without modifying the data structures).
– Anton Trunov
Nov 25 '16 at 16:35
Nice! I really don't see how to prove it without using an hand-crafted fixpoint. Following your example, I factored the proof through a
star_unfold
lemma which says that if s <<- (e ^*)
then exists n, s <<- ntimes n e
. In the case of a full-blown library, this should isolate the expensive check to star_unfold
alone given that later proofs can simply use induction on a natural number.– gallais
Nov 25 '16 at 19:33
star_unfold
s <<- (e ^*)
exists n, s <<- ntimes n e
star_unfold
@gallais Very good point! Please consider making your comment into an answer -- I hear comments are not very reliable on SO. You could've also used
Nat.iter
instead of ntimes
to make the code shorter: exists n, s <<- Nat.iter n (Cat e) #1.
(But maybe it hurts readability a bit).– Anton Trunov
Nov 25 '16 at 20:15
Nat.iter
ntimes
exists n, s <<- Nat.iter n (Cat e) #1.
Thanks @gallais and @Anton Trunov for your great answers. I'd never thought on using refine to prove such lemma! Also, the solution based on
ntimes
is really nice!– Rodrigo Ribeiro
Nov 26 '16 at 0:02
ntimes
This problem has obsessed me for a week, and I have finally found a solution that I find elegant.
I had already read that when an induction principle does not fit your needs, you can write and prove another one, more adapted to your problem. That is what I have done in this case. What we would want is the one obtained when using the more natural definition given in this answer. By doing this, we can keep the same definition (if changing it implies too many changes, for example) and reason about it more easily.
Here is the proof of the induction principle (I use a section to specify precisely the implicit arguments, since otherwise I observe strange behaviours with them, but the section mechanism is not necessary at all here).
Section induction_principle.
Context (P : string -> regex -> Prop)
(H_InEps : P "" #1)
(H_InChr : forall c, P (String c "") ($ c))
(H_InCat : forall {e e' s s' s1}, s <<- e -> P s e -> s' <<- e' ->
P s' e' -> s1 = s ++ s' -> P s1 (e @ e'))
(H_InLeft : forall {s e e'}, s <<- e -> P s e -> P s (e :+: e'))
(H_InRight : forall {s' e e'}, s' <<- e' -> P s' e' -> P s' (e :+: e'))
(H_InStar_Eps : forall e, P "" (e ^*))
(H_InStar_Cat : forall {s1 s2 e}, s1 <<- e -> s2 <<- (e ^*) ->
P s1 e -> P s2 (e ^*) -> P (s1++s2) (e ^*)).
Arguments H_InCat {_ _ _ _ _} _ _ _ _ _.
Arguments H_InLeft {_ _ _} _ _.
Arguments H_InRight {_ _ _} _ _.
Arguments H_InStar_Cat {_ _ _} _ _ _ _.
Definition in_regex_ind2 : forall (s : string) (r : regex), s <<- r -> P s r.
Proof.
refine (fix in_regex_ind2 {s r} prf {struct prf} : P s r :=
match prf with
| InEps => H_InEps
| InChr c => H_InChr c
| InCat prf1 prf2 eq1 =>
H_InCat prf1 (in_regex_ind2 prf1) prf2 (in_regex_ind2 prf2) eq1
| InLeft _ prf => H_InLeft prf (in_regex_ind2 prf)
| InRight _ prf => H_InRight prf (in_regex_ind2 prf)
| InStar prf => _
end).
inversion prf; subst.
- inversion H1. apply H_InStar_Eps.
- inversion H1; subst.
apply H_InStar_Cat; try assumption; apply in_regex_ind2; assumption.
Qed.
End induction_principle.
And it turned out that the Qed
of this proof was not instantaneous (probably due to inversion
producing large terms as in this answer), but took less than 1s (maybe because the lemma is more abstract).
Qed
inversion
The star_lemma
becomes nearly trivial to prove (as soon as we know the remember
trick), as with the natural definition.
star_lemma
remember
Lemma star_lemma : forall s, s <<- (#1 ^*) -> s = "".
Proof.
intros s H. remember (#1 ^*) as r.
induction H using in_regex_ind2; try discriminate.
- reflexivity.
- inversion Heqr; subst.
inversion H. rewrite IHin_regex2 by reflexivity. reflexivity.
Qed.
Thanks for your time, @eponier! Your idea of creating a new custom induction principle is really nice and, I believe, general, since it could be adapted to other problems.
– Rodrigo Ribeiro
Dec 7 '16 at 15:46
That's brilliant! By the way, when I "measure" performance with
time coqc regex.v
I get the average of 0,85s user
when testing your code as it is and 0,70s user
when replacing inversion
with inversion_clear
in in_regex_ind2
(the 1st and the 3d occurrences matter, the 2nd one predictably doesn't).– Anton Trunov
Dec 9 '16 at 10:52
time coqc regex.v
0,85s user
0,70s user
inversion
inversion_clear
in_regex_ind2
@Zimmi48 I was not totally sure whether it was a question or a request. I updated the second proof accordingly.
– eponier
Dec 9 '16 at 13:16
@AntonTrunov That is interesting. I thought
inversion_clear
was more or less a syntactic sugar, but it seems to make a difference here. With Time Qed.
, I get 0.220s with inversion
and 0.120s with inversion_clear
.– eponier
Dec 9 '16 at 13:20
inversion_clear
Time Qed.
inversion
inversion_clear
Indeed, both proof terms are equal (with
inversion_clear
and inversion; clear
).– eponier
Dec 9 '16 at 13:46
inversion_clear
inversion; clear
I modified a bit the definition of your in_regex
predicate:
in_regex
Inductive in_regex : string -> regex -> Prop :=
| InEps
: "" <<- #1
| InChr
: forall c
, (String c EmptyString) <<- ($ c)
| InCat
: forall e e' s s' s1
, s <<- e
-> s' <<- e'
-> s1 = s ++ s'
-> s1 <<- (e @ e')
| InLeft
: forall s e e'
, s <<- e
-> s <<- (e :+: e')
| InRight
: forall s' e e'
, s' <<- e'
-> s' <<- (e :+: e')
| InStarLeft
: forall e
, "" <<- (e ^*)
| InStarRight
: forall s s' e
, s <<- e
-> s' <<- (e ^*)
-> (s ++ s') <<- (e ^*)
where "s '<<-' e" := (in_regex s e).
and could prove your lemma:
Lemma star_lemma : forall s, s <<- (#1 ^*) -> s = "".
Proof.
intros s H.
remember (#1 ^*) as r.
induction H; inversion Heqr; clear Heqr; trivial.
subst e.
rewrite IHin_regex2; trivial.
inversion H; trivial.
Qed.
Some explanations are necessary.
I did an induction on H
. The reasoning is: if I have a proof of s <<- (#1 ^*)
then this proof must have the following form...
H
s <<- (#1 ^*)
The tactic remember
create a new hypothesis Heqr
which, combined with inversion
will help get rid of cases which cannot possibly give this proof (in fact all the cases minus the ones where ^*
is in the conclusion).
remember
Heqr
inversion
^*
Unfortunately, this path of reasoning does not work with the definition you had for the in_regex
predicate because it will create an unsatisfiable condition to the induction hypothesis. That's why I modified your inductive predicate as well.
in_regex
The modified inductive tries to give a more basic definition of being in (e ^*)
. Semantically, I think this is equivalent.
(e ^*)
I would be interested to read a proof on the original problem.
The proof of the original problem is here :).
– Anton Trunov
Nov 25 '16 at 14:31
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Did you manage to prove simpler properties in your theory? When did it fail? DId you do any research?
– Dmitri Chubarov
Nov 25 '16 at 1:20