Message ID | 20231116014350.653792-13-jsnow@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | qapi: statically type schema.py | expand |
John Snow <jsnow@redhat.com> writes: > Differentiate between "actively in the process of checking" and > "checking has completed". This allows us to clean up the types of some > internal fields such as QAPISchemaObjectType's members field which > currently uses "None" as a canary for determining if check has > completed. Certain members become valid only after .check(). Two ways to code that: 1. Assign to such members only in .check(). If you try to use them before .check(), AttributeError. Nice. Drawback: pylint is unhappy, W0201 attribute-defined-outside-init. 2. Assign None in .__init__(), and the real value in .check(). If you try to use them before .check(), you get None, which hopefully leads to an error. Meh, but pylint is happy. I picked 2. because pylint's warning made me go "when in Rome..." With type hints, we can declare in .__init__(), and assign in .check(). Gives me the AttributeError I like, and pylint remains happy. What do you think? Splitting .checked feels like a separate change, though. I can't quite see why we need it. Help me out: what problem does it solve? > This simplifies the typing from a cumbersome Optional[List[T]] to merely > a List[T], which is more pythonic: it is safe to iterate over an empty > list with "for x in []" whereas with an Optional[List[T]] you have to > rely on the more cumbersome "if L: for x in L: ..." Yes, but when L is None, it's *invalid*, and for i in L *should* fail when L is invalid. I think the actual problem is something else. By adding the type hint Optional[List[T]], the valid uses of L become type errors. We really want L to be a List[T]. Doesn't mean we have to (or want to) make uses of invalid L "work". > RFC: are we guaranteed to have members here? can we just use "if > members" without adding the new field? I'm afraid I don't understand these questions. > Signed-off-by: John Snow <jsnow@redhat.com> > --- > scripts/qapi/schema.py | 24 +++++++++++++++--------- > 1 file changed, 15 insertions(+), 9 deletions(-) > > diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py > index 164d86c4064..200bc0730d6 100644 > --- a/scripts/qapi/schema.py > +++ b/scripts/qapi/schema.py > @@ -18,7 +18,7 @@ > from collections import OrderedDict > import os > import re > -from typing import List, Optional > +from typing import List, Optional, cast > > from .common import ( > POINTER_SUFFIX, > @@ -447,22 +447,24 @@ def __init__(self, name, info, doc, ifcond, features, > self.base = None > self.local_members = local_members > self.variants = variants > - self.members = None > + self.members: List[QAPISchemaObjectTypeMember] = [] Can we do self.members: List[QAPISchemaObjectTypeMember] ? > + self._checking = False > > def check(self, schema): > # This calls another type T's .check() exactly when the C > # struct emitted by gen_object() contains that T's C struct > # (pointers don't count). > - if self.members is not None: > - # A previous .check() completed: nothing to do > - return > - if self._checked: > + if self._checking: > # Recursed: C struct contains itself > raise QAPISemError(self.info, > "object %s contains itself" % self.name) > + if self._checked: > + # A previous .check() completed: nothing to do > + return > > + self._checking = True > super().check(schema) > - assert self._checked and self.members is None > + assert self._checked and not self.members If we assign only in .check(), we can't read self.members to find out whether it's valid. We could perhaps mess with .__dict__() instead. Not sure it's worthwhile. Dumb down the assertion? > > seen = OrderedDict() > if self._base_name: > @@ -479,13 +481,17 @@ def check(self, schema): > for m in self.local_members: > m.check(schema) > m.check_clash(self.info, seen) > - members = seen.values() > + > + # check_clash is abstract, but local_members is asserted to be > + # List[QAPISchemaObjectTypeMember]. Cast to the narrower type. > + members = cast(List[QAPISchemaObjectTypeMember], list(seen.values())) > > if self.variants: > self.variants.check(schema, seen) > self.variants.check_clash(self.info, seen) > > - self.members = members # mark completed > + self.members = members > + self._checking = False # mark completed > > # Check that the members of this type do not cause duplicate JSON members, > # and update seen to track the members seen so far. Report any errors
On Wed, Nov 22, 2023, 9:02 AM Markus Armbruster <armbru@redhat.com> wrote: > John Snow <jsnow@redhat.com> writes: > > > Differentiate between "actively in the process of checking" and > > "checking has completed". This allows us to clean up the types of some > > internal fields such as QAPISchemaObjectType's members field which > > currently uses "None" as a canary for determining if check has > > completed. > > Certain members become valid only after .check(). Two ways to code > that: > > 1. Assign to such members only in .check(). If you try to use them > before .check(), AttributeError. Nice. Drawback: pylint is unhappy, > W0201 attribute-defined-outside-init. > Can be overcome by declaring the field in __init__, which satisfies both the linter and my developer usability sense (Classes should be easy to have their properties enumerated by looking in one well known place.) > 2. Assign None in .__init__(), and the real value in .check(). If you > try to use them before .check(), you get None, which hopefully leads to > an error. Meh, but pylint is happy. > > I picked 2. because pylint's warning made me go "when in Rome..." > Yep, this is perfectly cromulent dynamically typed Python. It's not the Roman's fault I'm packing us up to go to another empire. > With type hints, we can declare in .__init__(), and assign in .check(). > Gives me the AttributeError I like, and pylint remains happy. What do > you think? > Sounds good to me in general, I already changed this for 2/3 of my other uses of @property. (I'm only reluctant because I don't really like that it's a "lie", but in this case, without potentially significant rewrites, it's a reasonable type band-aid. Once we're type checked, we can refactor more confidently if we so desire, to make certain patterns less prominent or landmine-y.) > Splitting .checked feels like a separate change, though. I can't quite > see why we need it. Help me out: what problem does it solve? > Mechanically, I wanted to eliminate the Optional type from the members field, but you have conditionals in the code that use the presence or absence of this field as a query to determine if we had run check or not yet. So I did the stupidest possible thing and added a "checked" variable to explicitly represent it. > > This simplifies the typing from a cumbersome Optional[List[T]] to merely > > a List[T], which is more pythonic: it is safe to iterate over an empty > > list with "for x in []" whereas with an Optional[List[T]] you have to > > rely on the more cumbersome "if L: for x in L: ..." > > Yes, but when L is None, it's *invalid*, and for i in L *should* fail > when L is invalid. > Sure, but it's so invalid that it causes static typing errors. You can't write "for x in None" in a way that makes mypy happy, None is not iterable. If you want to preserve the behavior of "iterating an empty collection is an Assertion", you need a custom iterator that throws an exception when the collection is empty. I can do that, if you'd like, but I think it's actually fine to just allow the collection to be empty and to just catch the error in check() with either an assertion (oops, something went wrong and the list is empty, this should never happen) or a QAPISemError (oops, you didn't specify any members, which is illegal.) I'd prefer to catch this in check and just allow the iterator to permit empty iterators at the callsite, knowing it'll never happen. > I think the actual problem is something else. By adding the type hint > Optional[List[T]], the valid uses of L become type errors. We really > want L to be a List[T]. Doesn't mean we have to (or want to) make uses > of invalid L "work". > I didn't think I did allow for invalid uses to work - if the list should semantically never be empty, I think it's fine to enforce that in schema.py during construction of the schema object and to assume all uses of "for x in L: ..." are inherently valid. > > RFC: are we guaranteed to have members here? can we just use "if > > members" without adding the new field? > > I'm afraid I don't understand these questions. > I think you answered this one for me; I was asking if it was ever valid in any circumstance to have an empty members list, but I think you've laid out in your response that it isn't. And I think with that knowledge I can simplify this patch, but don't quite recall. (On my mobile again, please excuse my apparent laziness.) > > Signed-off-by: John Snow <jsnow@redhat.com> > > --- > > scripts/qapi/schema.py | 24 +++++++++++++++--------- > > 1 file changed, 15 insertions(+), 9 deletions(-) > > > > diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py > > index 164d86c4064..200bc0730d6 100644 > > --- a/scripts/qapi/schema.py > > +++ b/scripts/qapi/schema.py > > @@ -18,7 +18,7 @@ > > from collections import OrderedDict > > import os > > import re > > -from typing import List, Optional > > +from typing import List, Optional, cast > > > > from .common import ( > > POINTER_SUFFIX, > > @@ -447,22 +447,24 @@ def __init__(self, name, info, doc, ifcond, > features, > > self.base = None > > self.local_members = local_members > > self.variants = variants > > - self.members = None > > + self.members: List[QAPISchemaObjectTypeMember] = [] > > Can we do > > self.members: List[QAPISchemaObjectTypeMember] > > ? > Possibly, but also possibly I can just initialize it to an empty list. > > + self._checking = False > > > > def check(self, schema): > > # This calls another type T's .check() exactly when the C > > # struct emitted by gen_object() contains that T's C struct > > # (pointers don't count). > > - if self.members is not None: > > - # A previous .check() completed: nothing to do > > - return > > - if self._checked: > > + if self._checking: > > # Recursed: C struct contains itself > > raise QAPISemError(self.info, > > "object %s contains itself" % self.name) > > + if self._checked: > > + # A previous .check() completed: nothing to do > > + return > > > > + self._checking = True > > super().check(schema) > > - assert self._checked and self.members is None > > + assert self._checked and not self.members > > If we assign only in .check(), we can't read self.members to find out > whether it's valid. We could perhaps mess with .__dict__() instead. > Not sure it's worthwhile. Dumb down the assertion? > If I initialize to an empty list (and don't mess with the checked member) maybe assert self._checked and not self.members would be perfectly acceptable. > > > > seen = OrderedDict() > > if self._base_name: > > @@ -479,13 +481,17 @@ def check(self, schema): > > for m in self.local_members: > > m.check(schema) > > m.check_clash(self.info, seen) > > - members = seen.values() > > + > > + # check_clash is abstract, but local_members is asserted to be > > + # List[QAPISchemaObjectTypeMember]. Cast to the narrower type. > > + members = cast(List[QAPISchemaObjectTypeMember], > list(seen.values())) > > > > if self.variants: > > self.variants.check(schema, seen) > > self.variants.check_clash(self.info, seen) > > > > - self.members = members # mark completed > > + self.members = members > > + self._checking = False # mark completed > > > > # Check that the members of this type do not cause duplicate JSON > members, > > # and update seen to track the members seen so far. Report any > errors > >
John Snow <jsnow@redhat.com> writes: > On Wed, Nov 22, 2023, 9:02 AM Markus Armbruster <armbru@redhat.com> wrote: > >> John Snow <jsnow@redhat.com> writes: >> >> > Differentiate between "actively in the process of checking" and >> > "checking has completed". This allows us to clean up the types of some >> > internal fields such as QAPISchemaObjectType's members field which >> > currently uses "None" as a canary for determining if check has >> > completed. >> >> Certain members become valid only after .check(). Two ways to code >> that: >> >> 1. Assign to such members only in .check(). If you try to use them >> before .check(), AttributeError. Nice. Drawback: pylint is unhappy, >> W0201 attribute-defined-outside-init. >> > > Can be overcome by declaring the field in __init__, which satisfies both > the linter and my developer usability sense (Classes should be easy to have > their properties enumerated by looking in one well known place.) > > >> 2. Assign None in .__init__(), and the real value in .check(). If you >> try to use them before .check(), you get None, which hopefully leads to >> an error. Meh, but pylint is happy. >> >> I picked 2. because pylint's warning made me go "when in Rome..." >> > > Yep, this is perfectly cromulent dynamically typed Python. It's not the > Roman's fault I'm packing us up to go to another empire. > > >> With type hints, we can declare in .__init__(), and assign in .check(). >> Gives me the AttributeError I like, and pylint remains happy. What do >> you think? >> > > Sounds good to me in general, I already changed this for 2/3 of my other > uses of @property. > > (I'm only reluctant because I don't really like that it's a "lie", but in > this case, without potentially significant rewrites, it's a reasonable type > band-aid. Once we're type checked, we can refactor more confidently if we > so desire, to make certain patterns less prominent or landmine-y.) The general problem is "attribute value is valid only after a state transition" (here: .member is valid only after .check()). We want to catch uses of the attribute before it becomes valid. We want to keep pylint and mypy happy. Solutions: 1. Initialize in .__init__() to some invalid value. Set it to the valid value in .check(). 1.a. Pick the "natural" invalid value: None How to catch: assert attribute value is valid (here: .members is not None). Easy to forget. Better: when the use will safely choke on the invalid value (here: elide for uses like for m in .members), catch is automatic. Pylint: fine. Mypy: adding None to the set of values changes the type from T to Optional[T]. Because of this, mypy commonly can't prove valid uses are valid. Keeping it happy requires cluttering the code with assertions and such. Meh. Note: catching invalid uses is a run time check. Mypy won't. 1.b. Pick an invalid value of type T (here: []) How to catch: same as 1.a., except automatic catch is rare. Meh. Pylint: fine. Mypy: fine. 2. Declare in .__init__() without initializing. Initialize to valid value in .check() How to catch: always automatic. Good, me want! Pylint: fine. Mypy: fine. Note: catching invalid uses is a run time check. Mypy won't. 3. Express the state transition in the type system To catch invalid uses statically with mypy, we need to use different types before and after the state transition. Feels possible. Also feels ludicrously overengineered. May I have 2., please? >> Splitting .checked feels like a separate change, though. I can't quite >> see why we need it. Help me out: what problem does it solve? >> > > Mechanically, I wanted to eliminate the Optional type from the members > field, but you have conditionals in the code that use the presence or > absence of this field as a query to determine if we had run check or not > yet. > > So I did the stupidest possible thing and added a "checked" variable to > explicitly represent it. If 2. complicates the existing "have we .check()ed?" code too much, then adding such a variable may indeed be useful. >> > This simplifies the typing from a cumbersome Optional[List[T]] to merely >> > a List[T], which is more pythonic: it is safe to iterate over an empty >> > list with "for x in []" whereas with an Optional[List[T]] you have to >> > rely on the more cumbersome "if L: for x in L: ..." >> >> Yes, but when L is None, it's *invalid*, and for i in L *should* fail >> when L is invalid. >> > > Sure, but it's so invalid that it causes static typing errors. > > You can't write "for x in None" in a way that makes mypy happy, None is not > iterable. A variable that is declared, but not initialized (2. above) also not iterable, and it does make mypy happy, doesn't it? > If you want to preserve the behavior of "iterating an empty collection is > an Assertion", you need a custom iterator that throws an exception when the > collection is empty. I can do that, if you'd like, but I think it's > actually fine to just allow the collection to be empty and to just catch > the error in check() with either an assertion (oops, something went wrong > and the list is empty, this should never happen) or a QAPISemError (oops, > you didn't specify any members, which is illegal.) > > I'd prefer to catch this in check and just allow the iterator to permit > empty iterators at the callsite, knowing it'll never happen. > > >> I think the actual problem is something else. By adding the type hint >> Optional[List[T]], the valid uses of L become type errors. We really >> want L to be a List[T]. Doesn't mean we have to (or want to) make uses >> of invalid L "work". >> > > I didn't think I did allow for invalid uses to work - if the list should > semantically never be empty, I think it's fine to enforce that in schema.py > during construction of the schema object and to assume all uses of "for x > in L: ..." are inherently valid. > > >> > RFC: are we guaranteed to have members here? can we just use "if >> > members" without adding the new field? >> >> I'm afraid I don't understand these questions. >> > > I think you answered this one for me; I was asking if it was ever valid in > any circumstance to have an empty members list, but I think you've laid out > in your response that it isn't. > > And I think with that knowledge I can simplify this patch, but don't quite > recall. (On my mobile again, please excuse my apparent laziness.) Feel excused! [...]
diff --git a/scripts/qapi/schema.py b/scripts/qapi/schema.py index 164d86c4064..200bc0730d6 100644 --- a/scripts/qapi/schema.py +++ b/scripts/qapi/schema.py @@ -18,7 +18,7 @@ from collections import OrderedDict import os import re -from typing import List, Optional +from typing import List, Optional, cast from .common import ( POINTER_SUFFIX, @@ -447,22 +447,24 @@ def __init__(self, name, info, doc, ifcond, features, self.base = None self.local_members = local_members self.variants = variants - self.members = None + self.members: List[QAPISchemaObjectTypeMember] = [] + self._checking = False def check(self, schema): # This calls another type T's .check() exactly when the C # struct emitted by gen_object() contains that T's C struct # (pointers don't count). - if self.members is not None: - # A previous .check() completed: nothing to do - return - if self._checked: + if self._checking: # Recursed: C struct contains itself raise QAPISemError(self.info, "object %s contains itself" % self.name) + if self._checked: + # A previous .check() completed: nothing to do + return + self._checking = True super().check(schema) - assert self._checked and self.members is None + assert self._checked and not self.members seen = OrderedDict() if self._base_name: @@ -479,13 +481,17 @@ def check(self, schema): for m in self.local_members: m.check(schema) m.check_clash(self.info, seen) - members = seen.values() + + # check_clash is abstract, but local_members is asserted to be + # List[QAPISchemaObjectTypeMember]. Cast to the narrower type. + members = cast(List[QAPISchemaObjectTypeMember], list(seen.values())) if self.variants: self.variants.check(schema, seen) self.variants.check_clash(self.info, seen) - self.members = members # mark completed + self.members = members + self._checking = False # mark completed # Check that the members of this type do not cause duplicate JSON members, # and update seen to track the members seen so far. Report any errors
Differentiate between "actively in the process of checking" and "checking has completed". This allows us to clean up the types of some internal fields such as QAPISchemaObjectType's members field which currently uses "None" as a canary for determining if check has completed. This simplifies the typing from a cumbersome Optional[List[T]] to merely a List[T], which is more pythonic: it is safe to iterate over an empty list with "for x in []" whereas with an Optional[List[T]] you have to rely on the more cumbersome "if L: for x in L: ..." RFC: are we guaranteed to have members here? can we just use "if members" without adding the new field? Signed-off-by: John Snow <jsnow@redhat.com> --- scripts/qapi/schema.py | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-)