Skip to content

Rework hypervisor concept #47

@sevenautumns

Description

@sevenautumns

As it stands, it is not clear whether or not the current concept is arinc 653 compliant.
Also the current concept may not be extendable for arinc 653 part 1.

Issues:

  • Are the responsibilities distributed correctly (hypervisor/partition responsibilities)
  • The current error system is not easy to use and should be compatible with the Arinc 653 Healthmonitor
  • Currently, calls from the partition to the hypervisor do not allow for answers from the hypervisor (No Syscall/RPC behaviour)

Possible Solution:

Error System / Health Monitor

  • Remove the level from errors (errors only have a type)
  • Use a state machine for all possible states of a partition (as far as the hypervisor is concerned)
    • Init
    • Running
      • Cold/Warm start
      • Normal
      • Idle
    • Transition to
    • Paused (when not within its scheduling window)
    • Restart
    • Error
  • The Error state handles the error according to the health monitor table

"Systemcall" from partition to hypervisor

Use ptrace(2) together with PTRACE_SYSEMU (which is already used for realizing user-mode Linux) for trapping partitions processes on systemcalls, replacing the call with the desired behaviour within the hypervisor.
Theoretically, non-existent systemcall ids could be used for identifying APEX functions when using ptrace(2).
When clone(3) is used for spawning the main process of a partition, PTRACE_TRACEME can be called for allowing ptrace.
The hypervisor can wait on the partitions 'SIGTRAP' with sigtimedwait from sigwaitinfo(2) utilizing a timeout.

Hypervisor Main Loop

  • Maintain "EventList"
    • Candidate: SkipMap
    • Events:
      • Start Partition
      • Stop Partition
      • Start Process
      • Process Deadline (Revisit ARINC deadline actions)
      • Health event (For having a central handling authority)
  • Wait for SIGCHLD
    • Utilize signalfd(2)
    • Wait with Poller on SIGCHLD or timeout elapse (timeout from remaining time until next event in "EventList")
  • On either SIGCHLD or timeout elapse
    • Check if new more recent event is due (for example new "Start Process" or "Health event")
    • Give every active partition a chance to check their processes for a SIGTRAP
      • Spawn handler thread for serving the catched syscall
        • Use rayon::ThreadPool
        • TODO somehow remember which processes with SIGTRAP are already served

TODO

  • Check if we can actually use non-existent systemcall ids
  • Check if we can return custom data when emulating APEX systemcalls

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions