First, my students and I looked for robust ways of extracting linguistic features from moderately ungrammatical texts produced by English language learners, aiming to give learners formative feedback that was accurate, interpretable, and actionable. We found that statistical parsers (i.e. software that automatically identifies the syntactic structure of a sentence based on a manually-annotated training corpus), such as
Stanford CoreNLP, could yield syntactic information that would be usable for detecting certain types of grammatical errors with mal-rules (i.e. explicit, formal, computer-readable descriptions of ungrammatical structures) that combine constituency, dependency, and surface text patterns. The complexity of these mal-rules, however, was prohibitively high in currently available formalisms (e.g.
TGrep).
I therefore developed a simple but scalable declarative formalism for writing rules. This formalism turned out to be easy to use for students in applied linguistics, and thus was adopted by 5 PhD students for their dissertation research. It trans-compiles into Prolog programs that operate on syntactic trees, part-of-speech tags, syntactic dependency information, and surface text. A prototype error detection system presented in paper [2] outperforms existing AWE tools in the detection of certain error types.