Stuck on a simple proof about regular expressions


Stuck on a simple proof about regular expressions



I'm trying to formalize some properties on regular expressions (REs) using Coq. But, I've got some troubles to prove a rather simple property:



For all strings s, if s is in the language of (epsilon)* RE, then s =
"", where epsilon and * denotes the empty string RE and Kleene star
operation.



This seems to be an obvious application of induction / inversion tactics, but I couldn't make it work.



The minimal working code with the problematic lemma is in the following gist.
Any tip on how should I proceed will be appreciated.



EDIT:



One of my tries was something like:


Lemma star_lemma : forall s, s <<- (#1 ^*) -> s = "".
Proof.
intros s H.
inverts* H.
inverts* H2.
inverts* H1.
inverts* H1.
inverts* H2.
simpl in *.
-- stuck here



that leave me with the following goal:


s' : string
H4 : s' <<- (#1 ^*)
============================
s' = ""



At least to me, it appears that using induction would finish the proof, since I could use H4 in induction hypothesis to finish the proof, but when I start the proof using


induction H



instead of


inverts* H



I got some (at least for me) senseless goals. In Idris / Agda, such proof just follows by pattern matching and recursion over the structure of s <<- (#1 ^*). My point is how to do such recursion in Coq.





Did you manage to prove simpler properties in your theory? When did it fail? DId you do any research?
– Dmitri Chubarov
Nov 25 '16 at 1:20





I started my try with induction H and could proof each subgoal with subst; reflexivity except for one, the case inChr. I get stuck in the same situation when I try induction s. Is it possible that the representation is not as intended?
– ichistmeinname
Nov 25 '16 at 9:17


induction H


subst; reflexivity


inChr


induction s





I'm thinking your theorem is a bit too specialized. Coq doesn't handle well induction on term that are not "all variables". Here your ` s <<- (#1 ^*)` should be replaced with ` s <<- r -> r = (#1 ^*)` so that recursion on r is a variable, not an instantiated term. You'll have lots of dummy cases but the induction hypothesis should be easier to use. You might also have a look at dependent induction.
– Vinz
Nov 25 '16 at 12:01



r


dependent induction





@Vinz The problem is that in the induction hypothesis r becomes #1 :+: ((#1 @ (#1 ^*)) so you have as a precondition the spurious #1 :+: ((#1 @ (#1 ^*)) = (#1 ^*) which means that it cannot be used anymore.
– gallais
Nov 25 '16 at 13:18



r


#1 :+: ((#1 @ (#1 ^*))


#1 :+: ((#1 @ (#1 ^*)) = (#1 ^*)





Also dependent induction will introduce a dependency on the JMeq axiom.
– Anton Trunov
Nov 25 '16 at 16:36


dependent induction


JMeq




3 Answers
3



Here is one possible solution of the original problem:


Lemma star_lemma : forall s,
s <<- (#1 ^*) -> s = "".
Proof.
refine (fix star_lemma s prf {struct prf} : s = "" := _).
inversion_clear prf; subst.
inversion_clear H; subst.
- now inversion H0.
- inversion_clear H0; subst. inversion_clear H; subst.
rewrite (star_lemma s' H1).
reflexivity.
Qed.



The main idea is to introduce a term in the context which will resemble the recursive call in a typical Idris proof. The approaches with remember and dependent induction don't work well (without modifications of in_regex) because they introduce impossible to satisfy equations as induction hypotheses' premises.


remember


dependent induction


in_regex



Note: it can take a while to check this lemma (around 40 seconds on my machine under Coq 8.5pl3). I think it's due to the fact that the inversion tactic tends to generate big proof terms.


inversion





(1) It's a pleasure to prove the lemma in Idris (comparing to Coq). (2) I'm wondering if there is a faster method (without modifying the data structures).
– Anton Trunov
Nov 25 '16 at 16:35





Nice! I really don't see how to prove it without using an hand-crafted fixpoint. Following your example, I factored the proof through a star_unfold lemma which says that if s <<- (e ^*) then exists n, s <<- ntimes n e. In the case of a full-blown library, this should isolate the expensive check to star_unfold alone given that later proofs can simply use induction on a natural number.
– gallais
Nov 25 '16 at 19:33


star_unfold


s <<- (e ^*)


exists n, s <<- ntimes n e


star_unfold





@gallais Very good point! Please consider making your comment into an answer -- I hear comments are not very reliable on SO. You could've also used Nat.iter instead of ntimes to make the code shorter: exists n, s <<- Nat.iter n (Cat e) #1. (But maybe it hurts readability a bit).
– Anton Trunov
Nov 25 '16 at 20:15


Nat.iter


ntimes


exists n, s <<- Nat.iter n (Cat e) #1.





Thanks @gallais and @Anton Trunov for your great answers. I'd never thought on using refine to prove such lemma! Also, the solution based on ntimes is really nice!
– Rodrigo Ribeiro
Nov 26 '16 at 0:02


ntimes



This problem has obsessed me for a week, and I have finally found a solution that I find elegant.



I had already read that when an induction principle does not fit your needs, you can write and prove another one, more adapted to your problem. That is what I have done in this case. What we would want is the one obtained when using the more natural definition given in this answer. By doing this, we can keep the same definition (if changing it implies too many changes, for example) and reason about it more easily.



Here is the proof of the induction principle (I use a section to specify precisely the implicit arguments, since otherwise I observe strange behaviours with them, but the section mechanism is not necessary at all here).


Section induction_principle.

Context (P : string -> regex -> Prop)
(H_InEps : P "" #1)
(H_InChr : forall c, P (String c "") ($ c))
(H_InCat : forall {e e' s s' s1}, s <<- e -> P s e -> s' <<- e' ->
P s' e' -> s1 = s ++ s' -> P s1 (e @ e'))
(H_InLeft : forall {s e e'}, s <<- e -> P s e -> P s (e :+: e'))
(H_InRight : forall {s' e e'}, s' <<- e' -> P s' e' -> P s' (e :+: e'))
(H_InStar_Eps : forall e, P "" (e ^*))
(H_InStar_Cat : forall {s1 s2 e}, s1 <<- e -> s2 <<- (e ^*) ->
P s1 e -> P s2 (e ^*) -> P (s1++s2) (e ^*)).

Arguments H_InCat {_ _ _ _ _} _ _ _ _ _.
Arguments H_InLeft {_ _ _} _ _.
Arguments H_InRight {_ _ _} _ _.
Arguments H_InStar_Cat {_ _ _} _ _ _ _.

Definition in_regex_ind2 : forall (s : string) (r : regex), s <<- r -> P s r.
Proof.
refine (fix in_regex_ind2 {s r} prf {struct prf} : P s r :=
match prf with
| InEps => H_InEps
| InChr c => H_InChr c
| InCat prf1 prf2 eq1 =>
H_InCat prf1 (in_regex_ind2 prf1) prf2 (in_regex_ind2 prf2) eq1
| InLeft _ prf => H_InLeft prf (in_regex_ind2 prf)
| InRight _ prf => H_InRight prf (in_regex_ind2 prf)
| InStar prf => _
end).
inversion prf; subst.
- inversion H1. apply H_InStar_Eps.
- inversion H1; subst.
apply H_InStar_Cat; try assumption; apply in_regex_ind2; assumption.
Qed.

End induction_principle.



And it turned out that the Qed of this proof was not instantaneous (probably due to inversion producing large terms as in this answer), but took less than 1s (maybe because the lemma is more abstract).


Qed


inversion



The star_lemma becomes nearly trivial to prove (as soon as we know the remember trick), as with the natural definition.


star_lemma


remember


Lemma star_lemma : forall s, s <<- (#1 ^*) -> s = "".
Proof.
intros s H. remember (#1 ^*) as r.
induction H using in_regex_ind2; try discriminate.
- reflexivity.
- inversion Heqr; subst.
inversion H. rewrite IHin_regex2 by reflexivity. reflexivity.
Qed.





Thanks for your time, @eponier! Your idea of creating a new custom induction principle is really nice and, I believe, general, since it could be adapted to other problems.
– Rodrigo Ribeiro
Dec 7 '16 at 15:46





That's brilliant! By the way, when I "measure" performance with time coqc regex.v I get the average of 0,85s user when testing your code as it is and 0,70s user when replacing inversion with inversion_clear in in_regex_ind2 (the 1st and the 3d occurrences matter, the 2nd one predictably doesn't).
– Anton Trunov
Dec 9 '16 at 10:52


time coqc regex.v


0,85s user


0,70s user


inversion


inversion_clear


in_regex_ind2





@Zimmi48 I was not totally sure whether it was a question or a request. I updated the second proof accordingly.
– eponier
Dec 9 '16 at 13:16





@AntonTrunov That is interesting. I thought inversion_clear was more or less a syntactic sugar, but it seems to make a difference here. With Time Qed., I get 0.220s with inversion and 0.120s with inversion_clear.
– eponier
Dec 9 '16 at 13:20


inversion_clear


Time Qed.


inversion


inversion_clear





Indeed, both proof terms are equal (with inversion_clear and inversion; clear).
– eponier
Dec 9 '16 at 13:46


inversion_clear


inversion; clear



I modified a bit the definition of your in_regex predicate:


in_regex


Inductive in_regex : string -> regex -> Prop :=
| InEps
: "" <<- #1
| InChr
: forall c
, (String c EmptyString) <<- ($ c)
| InCat
: forall e e' s s' s1
, s <<- e
-> s' <<- e'
-> s1 = s ++ s'
-> s1 <<- (e @ e')
| InLeft
: forall s e e'
, s <<- e
-> s <<- (e :+: e')
| InRight
: forall s' e e'
, s' <<- e'
-> s' <<- (e :+: e')
| InStarLeft
: forall e
, "" <<- (e ^*)
| InStarRight
: forall s s' e
, s <<- e
-> s' <<- (e ^*)
-> (s ++ s') <<- (e ^*)
where "s '<<-' e" := (in_regex s e).



and could prove your lemma:


Lemma star_lemma : forall s, s <<- (#1 ^*) -> s = "".
Proof.
intros s H.
remember (#1 ^*) as r.
induction H; inversion Heqr; clear Heqr; trivial.
subst e.
rewrite IHin_regex2; trivial.
inversion H; trivial.
Qed.



Some explanations are necessary.



I did an induction on H. The reasoning is: if I have a proof of s <<- (#1 ^*) then this proof must have the following form...


H


s <<- (#1 ^*)



The tactic remember create a new hypothesis Heqr which, combined with inversion will help get rid of cases which cannot possibly give this proof (in fact all the cases minus the ones where ^* is in the conclusion).


remember


Heqr


inversion


^*



Unfortunately, this path of reasoning does not work with the definition you had for the in_regex predicate because it will create an unsatisfiable condition to the induction hypothesis. That's why I modified your inductive predicate as well.


in_regex



The modified inductive tries to give a more basic definition of being in (e ^*). Semantically, I think this is equivalent.


(e ^*)



I would be interested to read a proof on the original problem.





The proof of the original problem is here :).
– Anton Trunov
Nov 25 '16 at 14:31






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Comments

Popular posts from this blog

paramiko-expect timeout is happening after executing the command

Export result set on Dbeaver to CSV

Opening a url is failing in Swift