Benchmarks in a Standard Form
Dear Organizers,
I would be very grateful if you could provide all benchmarks in a standard form based on which hybrid automata could be obtained easily, since it could not only save a lot of time for the tool authors but also ensure that we are using the same benchmarks.
I have just checked all of the benchmark papers in ARCH and found that they are provided in the files of various types. I am not sure whether all of them could be equivalently modeled by hybrid automata. If not, is it possible to slightly adapt those models to make them available to most of the tools?
Thanks for the help.
Xin
I completely agree. In the statuses document that Goran had sent out it said:
Where possible, the problem instance should originate in one of the benchmarks in the
ARCH repository (http://cps-vo.org/group/ARCH/benchmarks) and be specified in the SpaceEx file format, consisting of an XML file for the system and a CFG file for the specification (forbidden states). ( http://spaceex.imag.fr/documentation/user-documentation/spaceex-modeling-language-33 ).
Although nonlinear systems can't be run in SpaceEx currently, the model editor and format can still be used to create the models in a, hopefully, unambiguous way. Hyst can also be used to translate the models from this format. A printer for Flow*, for example, already exists. I can also help in the process of creating a Hyst printer for a specific tool.
In terms of model adaption, I think it's certainly allowed. The hope is that the benchmarks would originate from the ARCH set of benchmarks, but small deviations are probably not an issue. Just point out what's being changed in your benchmark proposal topic.
What do you think Xin?
I believe the intention/hope was (Goran or Matthias can correct me if I'm wrong) that each participant would provide a benchmark in the SpaceEX format based on one of the ARCH benchmarks. The category of the benchmark, whether it's nonliner or linear, or bounded depth, fixpoints or jump/time bound, depends on the category that their tool is participating in. Then, they would provide two safety specifications for the model ("problem instances"), one which is safe, and one which is unsafe.
@Stan, thank you for your comments, you're absolutely right.
For the competition it would be nice if every tool could run on at least one problem instance, and if every problem instance would have at least one tool running on it. Both criteria are satisfied if every tool-owner proposes a problem instance that he/she likes and that run on his/her tool.
I am mostly using Hyst to transform the provided Benchmarks to Flow* syntax, which my tool can parse. Regarding the configuration/settings, I think there are some problems concerning the translation:
I was wondering, how we should translate the number of iterations correctly. From what I know, the number of iterations in SpaceEx denotes the number of discrete post computations, i.e. the number of jumps taken. Other tools seem to specify a limit on the jump depth or a limit on the computed flowpipes. In any case, is it okay to adjust the jump depth to a value, such that the number of computed discrete post successors roughly is the same as the specified iteration bound?
In my opinion, "number of iterations" is a setting made up by verification tool authors and is not typically part of the verification problem being solved. For this reason, it's okay to manually adjust this setting based on what the particular tool is using.
Stan is absolutely right that the "number of iterations" setting in SpaceEx is made up and does not easily translate to search depth. Personally, I don't like "search depth" as a criterion since it depends on how things are modeled. Adding a self-loop may completely change the results. So the spec should be something that has meaning on the model level, like the number of transitions with a given label. In SpaceEx, this can then be verified by including a monitor automaton that counts the number of transitions.
Thanks Stan. I actually do not think modeling a nonlinear system in the SpaceEx format a problem, since there is Hyst and even manual translation is not hard. But I am wondering whether the organizers could provide some standard benchmarks for us or not. We should also consider the benchmarks of various categories, linear or nonlinear constraints, with or without disturbances, whether to check a fixed point. Also, it is necessary to provide several safety checking tasks of different difficulties for a system.
I will do so in my benchmark proposal.