Continuing on my series of posts about checkpointing in virtual platforms (see previous posts Simics, Cadence, our FDL paper), I have finally found a decent description of how CoWare does things for SystemC. It is pretty much the same approach as that taken by Cadence, in that it uses full stores a complete process state to disk, and uses special callbacks to handle the connection to open files and similar local resources on a system. The approach is described in a paper called “A Checkpoint/Restore Framework for SystemC-Based Virtual Platforms”, by Stefan Kraemer and Reiner Leupers of RWTH Aachen, and Dietmar Petras, and Thomas Philipp of CoWare, published at the International Symposium on System-on-Chip, in Tampere, Finland, in October of 2009.
The approach taken for their checkpointing system is to save the entire state of the running simulation program (i.e., an entire host operating system process), and later recreate the process in the same state. This gets around the need program all simulation models to explicitly support checkpointing, but it also limits the applicability of checkpointing severely. Of the checkpointing operations described in my previous post, it only supports the bringing up of a checkpoint into the same model, on the same machine (or a machine that runs a completely identical software stack down to the exact versions of all libraries, drivers, etc., which is not very likely to happen). This is honestly admitted in the paper, I appreciate that.
Process-based checkpointing in this way is also used by Cadence, and just like Cadence’s solution, the problem that appears when implementing it in practice is how to handle all the OS resources opened by a process. The process itself is not enough. The resources are typically files open for input and output, connections to debuggers, and other things that reach out of the virtual world of the simulation into the real world of the host machine. The solution is just like Cadence’s solution: provide callbacks for the SystemC side of such connections that can save the state of the connection in some way, and restart the connection when a checkpoint is opened. Doing this right does require some care in what to do in which order, and the paper nicely explains this.
For example, you need to close down all connections before taking a checkpoint, and then restore them after taking it. This is a fairly destructive operation, compared to the Simics-style checkpointing where all you do is interrogate the state of all objects and save that. I had not thought of that before, but it makes perfect sense.
To me, this also points to an important architectural issue in SystemC simulations in general: what are these simulation modules doing opening files on their own anyway? In my opinion, a simulation model should never do such low-level thing, it should operate using only the defined simulation system API and simulation-level connections to other simulation modules. If you need to load test data, do that by putting it into some storage system managed by the simulator itself. In all my professional life, I have considered file I/O to be a fundamental service provided by the application framework I use, nothing that my actual payload code should have to deal with. For example, in Simics, we tend to solve the loading and checkpointing of test data vectors using a memory “image”, which supports checkpointing in itself. Then a script just loads a file into the memory image, and a device reads the memory and moves transactions into the simulated system. This means that file I/O is totally absent from the model, and also that the test data can be managed and inspected by existing infrastructure.
The cost of taking a checkpoint is quantified in the paper, and it is not as bad as one could fear. The size of a checkpoint easily reaches 100s of MB for even small target systems, but saving and loading that today does not take all that long. Still, going to a system with only 16 target processors (8 ARM, 8 DSP, and 8 times 64 MB of target memory = 512 MB) generates a checkpoint taking 1.7 GB to store and 30 seconds to save.
Once again, it is interesting to compare this to the Simics-style approach, where only differences are saved for target memories, and only target memory that is in use needs to be saved at all. This makes for usually far more compact checkpoints, which save and open in a few seconds in most cases. A checkpoint taking 30 seconds to open in Simics is rare indeed, and basically requires a target containing many gigabytes of target memory that is all in use (not host memory).
The paper does raise a novel point for why checkpointing is good: it saves you the time to setup a debugger. It is certainly true that setting up a debug environment for a session does take time, but it is not made clear just how it is recreated. My impression is that really what is going on here is that the debugger remains alive and loaded, and the simulation goes back and forth in time using checkpoints. Which is perfectly valid and nice.
In Simics, we solve this problem in two ways: when it comes to setting up the internal debugger when opening a checkpoint, the solution is to use a script to configure it. On purpose and by design, Simics checkpoints do not store the simulation session state, only the target system state, as that is the only way to be portable over time and across widely different machines. it would be very strange to open a checkpoint a few years after it was created, by some random user, start a simulation, and have it stop just because there was some breakpoint still in place for some long-forgotten reason. In any case, the simulator-side of the debugger has to be adopted in all systems to accept checkpointing.
Another strange point in the paper that I would have liked to ask the authors about is the idea that you can always change the target software in the middle of a session. But what is the value of checkpointing then? Since the idea is saving the cost of booting a machine and setting up a debug session, it is hard to see what is gained by booting a machine, keeping the hardware state, but replacing the software state. The expensive operation is presumably setting up the software state? At least it is in my experience.
I commend the paper for actually running something more than the archetypal “ARM+DSP”, by running eight copies of an ARM+DSP subsystem. That is at least starting to look like something interesting to simulate.
Reviewer Notes for the Paper
Finally, I have some small critiques on the academic paper itself, of the kind that I tend to provide to paper authors when active as a reviewer for conferences.
The SoC conference paper itself is well-written, but I have point out that the authors for some reason ignore the history of checkpointing in full-system simulation. The seminal work here was the checkpointing system used for changing the level of abstraction from fast functional simulation to cycle-accurate simulation in the SimOS system developed at Stanford already in the early 1990s. This has later been continued in both IBM Mambo and Virtutech Simics, and remains the most powerful way of doing checkpointing until this day. I don’t think the authors were completely ignorant of this work, even though finding good references can be a bit difficult. Even so, here are some to add to future versions of that paper:
- SimOS: A Fast Operating System Simulation Environment, Stanford CS Technical Report CSL-TR-94-631, 1994. The earliest mention of using checkpointing in a full-system simulator (for switching from fast to detailed simulation).
- “Design and validation of a performance and power simulator for PowerPC systems“, from the IBM Journal of Research and Development in 2003. Mentions that IBM Mambo does checkpointing like SimOS.
- SIMFLEX: A Fast, Accurate, Flexible Full-System Simulation Framework for Performance Evaluation of Server Architecture, ACM SIGMETRICS Performance Evaluation Review, 2004. Also see the Simflex homepage. Shows an ambitious use of checkpointing in computer architecture simulations.
It is also a bit disingenuous to dismiss the issue of just how the process checkpointing is performed by a reference to an early requirements paper from the BLCR project. All that says is that there are lots of possible variants of implementation, nothing on what was done in this particular instance. It would have been nice to have had some more details: does the approach use a kernel module or not?