| DTrace/ARM | Tuesday, 19 March 2013 |
$ dtrace -n tick-1s
and the expected thing happens. Thats a huge relief - the core of DTrace just works.
Next was some small fixes for /proc/dtrace/fbt - to see the state of the available FBT probes (and specifically the instructions and patch values). A key difference of ARM vs x86 is that ARM is a 32-bit cpu - all instructions are 32 bit wide, and not variable like on x86 (which allows 1-byte or more instructions). Because instructions are 4-bytes wide, some changes to the key data structure (intr_t) is needed to handle the 4-byte wide value and apply that to, e.g. the FBT driver.
ARM presents many challenges: the first is that there are so many variants. I am targetting the ARMv6 architecture available on the RaspberryPi, and uses a hand built kernel running in a qemu VM.
At present I dont have a /proc/kcore, which is a nuisance - its useful using this to examine inside the kernel, e.g. disassembly or proving if a probe is what I expect it to be. (I can work around that easily).
Additionally, this is a single CPU kernel - no SMP. (I believe I found a bug in the Solaris DTrace where the maximum number of CPUs vs the kernel, is concerned; Solaris supports 32 i386 CPUs or 256 amd64 CPUs - if the max CPUs are configured, theres some out-by-one maths going on in the buffer snap code, which causes dtrace to abort early in its work. On ARM I hit this because NCPU==1, and the code tries to look at cpu#0 and cpu#1).
Now, I am ready to start handling FBT. I can get DTrace to plant probes on entry, but the return probes are broken, because its using the x86 instruction disassembler and not an ARM specific one (easy to fix, when I am ready).
But before I allow an FBT probe, I need to intercept the ARM single step and breakpoint interrupts. I found this page as a useful hint/starter to get me to the right place to locate what I am after...
http://pankaj-techstuff.blogspot.co.uk/2007/11/story-interrupt-handling-in-linux-2611.html
First step is to put a "no-op" interrupt handler in place to prove I am doing the right thing. I am not an ARM assembler expert, and am using gcc -S and ARM instruction references on the net, to get me closer to one.
Once this is done, then the syscall mess (in my code) can be looked at - the systrace.c code is full of hacks and quirks for i386/amd64 and most/none of this is relevant for ARM.
After that, we would be mostly done. (USDT will require some work, but USDT is more leasurely than core driver work).
The dtrace, available at my site or github doesnt have all the ARM updates, so hold off for a while before expecting this to work. I will update the blog when I feel its more ready for primetime.
| Divide by 10 | Sunday, 03 March 2013 |
One silly issue I have hit is module/divide arithmetic for 64-bit numbers. ARM, being a classic RISC chip, doesnt have a divide instruction but the kernel nicely hides this for you. In most cases, divide-by-a-constant is handled by the compiler via various optimisations.
At the moment, theres a few pieces of code where divide/modulo are not by a constant the compiler can see, which results in the compiler generating calls to maths helper functions. On the i386/x64 architectures, this is handled by dtrace mapping to the appropriate mechanisms to call the do_div() function.
On the ARM, its different - and confusing enough (given the differing ARM architectures) to not be working yet; so when dtrace is tracing to its own internal log buffer (which I need to debug the GPF's I am triggering) its a nuisance, because decimal numbers come out wrong.
Its interesting poking around the kernel code at the do_div64() macros and the printf type code, as the kernel makes good attempts to leverage native CPU features, or emulate, in the case of the ARM/Risc instructions, to get a reasonable performance.
Theres lots of ways to do divide, without actually doing a divide (many algorithms in the kernel). What is a nuisance is debugging the mechanism I am using to call these, as its mostly crashing the kernel, and difficult to do in user space (in user space, it can work, because the compiler "just works", which doesnt help prove what I am doing is correct in the kernel).
Still a way to go, but at least I can load the driver.
| Things I hate.... | Saturday, 23 February 2013 |
Lets download it. 538MB later....
FAILED: I am not going to tell you why, but sorry. I am really stupid.
Hm. Ok, may its me that is stupid. Lets try again. 538MB later...
FAILED: I am not going to tell you why, but sorry. I am really stupid.
This time, *you* are really stupid. Why did you download a whopper of an upgrade and discard it, only to download again? Why not do a validation *before* downloading 538MB file?
Why not actually tell me what is wrong? Is that too much to ask?
So, off to the ROM/rooting sites, and see if I can get it another way.
(My Note is rooted, but why cant the Samsung firmware updater tell you what it didnt like?)
Oh well.
| Why is the C preprocessor so bad? I mean...really bad | Monday, 18 February 2013 |
C is a low level language - as CPU speed increases have stagnated, people have looked at C for more gains in performance. C++ to a large extent improves on C, and negates much of the need for the preprocessor.
But one has to laugh at how much attention is given to compilers, optimisations, libraries and standards. The preprocessor is treated like a diseased lovechild.
In all these decades, the preprocessor has barely gotten any feature enhancements - except for varargs and the # and ## operators.
What is it missing? Well, for one, *intelligence*. Anything. Please.
#include "filename"
Such a great function, but so badly designed. What is the one thing every serious application does? Uses some form of autoconf, so you can do:
#ifdef SOMETHING # include "filename" #endif
Why cant that be in the language?
#include_only_if_exists "filename"
There. That wasnt hard, was it?
Ok, so lets try one more. Assume a header file defines a value, lets call it REG_R0. Another header file defines an enum for REG_R0. Consider the following:
#define REG_R0 0
enum { REG_R0 = 0 };
Of course thats a syntax error above. We can do:
#define REG_R0 0
#if !defined(REG_R0)
enum { REG_R0 = 0 };
#endif
If that enum appears in a header file, we have no way to put the #if statement in there. We might do something like:
#define REG_R0 0 #include_but_hide REG_R0, REG_R1, ... "filename" #endif
This might mean, pretend the specified values are not defined for the duration of including filename, so we can hide and avoid the syntax error that will result.
Why do I care? Well, DTrace is fighting issues with ARM register names with the "ucontext.h" file which seemingly wants to define register names via an enum. I havent worked out how to resolve this, portably and at the point of issue (rather than modify every source file to change the way #includes are done).
Really, every language implements string and file parser as the day-1 feature, and then adds bells and whistles for the rest of its life to handle this.
C, being a "portable assembler" chooses to ignore the compilation environment, and expects every developer to build inconsistent tools because of the shortcomings of the preprocessor and environment detection.
Why cant we do something like in Perl?
#if $ENV{COMPILER_FLAGS} =~ /-O3/
...
#endif
Oh well.
| DTrace/ARM | Saturday, 16 February 2013 |
This is a major achievement, although the code is ugly and broken. Many areas where we have __i386 or __amd64 are changed to handle __arm__ specific code or are simply left blank.
The interrupts are not plumbed in; SMP isnt handled (my VM is a single-cpu VM).
Theres also issues with the pieces of code which are heavy in x86 assembler (eg syscalls.c).
Dont expect to use this for anything useful, other than a starting point. Having got a complete driver, next is to debug piece by piece these blank holding areas.
| DTrace/ARM | Sunday, 10 February 2013 |
Now this is not a simple project - but I have started some of the work to prove this.
I had originally intended to use the RaspberryPi to do a lot of the heavy lifting, but was not happy with the RPI as a reliable hardware device and have issues with some of the kernels.
Last week, I took a look at qemu/arm - and can actually run an ARM based virtual machine on my x86 Linux machine. Performance isnt brilliant, but its palatable.
After spending ages getting the network to work - so I could get the requisite packages, I have been updating the scripts and headers/source code to fill in the gaps for ARM. At this point, we are about half way through the user land compilation. Some of the things are annoyances due to the Linux distro (debian 2.6.32 kernel). E.g. the handling of ucontext.h seems to be different.
Once I have user land compiling, I can move onto the kernel - obviously there is a fair amount of 386/x64 code to write for ARM, but am hoping that its viable - especially given that the i386/x64 code is on top of the original SPARC architecture - i.e. we know roughly the bits needing attention.
If I am lucky, at the end of this exercise, it will work on my ARM based VM; it almost certainly wont work on other ARM CPUs or other kernels (ie Android), but at least the heavy lifting will have been done, and it makes it more palatable to target the other platforms (or people may do this for themselves).
I'll put out a new release, which includes some Linux 3.7/3.8 fixes (thanks to those that contributed the fixes or highlighted some of the broken build things in my code).
| Writing Software. Forget about it. | Friday, 25 January 2013 |
Firstly, to remind anyone, if I can, to avoid the crisp.demon.co.uk email addresses and use CrispEditor@gmail.com. I have yet to get my systems setup properly to send via demon, after they moved email to IMAP.
Although things are quiet, its mainly because I have been slowly creating a ribbon bar for CRiSP. Piecing together how a ribbon bar works opened my eyes to a number of things. I tend to spend a couple of months on one or other of CRiSP vs DTrace.
Sometimes, you get so close to the software you are writing, things either slow to a crawl - you lose "instinct" - and other times, you need to go away for a bit (could be an afternoon, a day, week, or month), and forget the lines of code. You come back to the code as an "outsider" and immediately see the deficiencies which you could not see before.
With DTrace, the biggest problem is addressing and ensuring all kernels build - and this is done piecemeal, not by some divine CI (continuous integration system). There are hundreds if not thousands of linux kernels out there (2.6.1, 2.6.2, 2.6.3, ...) and the number of permutations of compile options means its infeasible to test all variants. (Some people have built projects to try all permutations of compiler flags - just for one release, and even that is a huge computation task).
Anyway, back to the ribbon. I'm fairly proud of CRiSPs GUI controls - written from scratch, a long time back, and the use of "constraints" to allow widgets to stick together. This mentality dates back to the early origins of X windows and allows apps to resize whilst looking reasonable. Most Windows apps, on the other hand, use pixel coordinates - this is great for GUI Designer tools, but is why most dialogs are not resizable. Even Windows' Open File dialog is "slightly" resizable - all the spare real-estate is given to the filenames (as it should) but it has caused source code portability issues in each Visual Studio release. It wasnt a great source code design.
As a side note, most mobile apps work in a similar vein - as Apple and Google released new devices which had various screen sizes and pixel densities, there has been a rush of app upgrades to cater for these screens.
Another side note: Apple uses floating point pixel co-ordinates and not integer ones. When Apple (and Postscript before them) did this, at the start, they were simply insane - huge amounts of cpu work just to draw things - back in the days when floating point units were very slow (or nonexistant). However, fractional coordinates are brilliant in todays every increasing screen sizes. (I do find it strange that even Apple has had problems on the iPhone, allowing non-multiples of the base 320x480 screen size).
Anyway, back to the ribbon bar. The ribbon is technically very interesting. If you look at a GUI control like a "listbox" or "tree control" - things we are all very familiar with, the layout options are easy to comprehend, and to most programmers, can figure out how to build one. (Building one to scale to millions of items is a challenge).
The ribbon is complex. Very complex and clever. I started by creating a control with "panels". Bunches of bitmaps and controls go into the panels, and give the MS Word-like appearance, quite easily. But, if you play with MS Word or Outlook and watch what happens as you shrink or expand the window, its unobvious what is going on. If you do it enough, you can see whats happening, and it feels "natural".
Now, consider how to implement this - with each icon and column of the ribbon bar, as you grow/shrink, you effectively have to try all permutations of "restacking" to get the desired effect. I started doing this, but did a sanity check. What happens if you have 50 controls inside the ribbon and want to compute the optimum layout as the window shrinks, and they wont all fit? By my calculation, this involves around 3^50 permutation checks. I was astounded. 3^50 is not acceptable. Maybe 1,000,000 attempts are. What we have is akin to a travelling salesmen problem (where we try to compute the optimal route in order to visit a series of destinations).
I had to read the Microsoft Ribbon control docs to finally "get it". A ribbon is a collection of panels. Each panel is a collection of groups. Each group is a collection of controls. Rather than deal with, say 50 individual controls, you might be dealing with 10 groups. Now, each group can be resized and laid out, almost independently (not quite). 3^10 is 59049, which is much more reasonable compute power to use.
Where does the "3" come from? Each control on a ribbon bar has 3 visible modes (big bitmap, small bitmap + label, or small bitmap).
Do people like or want a ribbon bar? I dont know. Those of us who use Word or Outlook, get used to it. Despite losing the drop down menus, and taking ages to figure out where everything sits, it looks "modern" and is "enjoyable" to play with. (Of course the ribbon is now quite a few years old, and Microsoft is going for the "tiled" effect).
So, that gives you a little perspective of "ribbon controls".
Interestingly, when the ribbon first appeared in MS Office, it wasnt customisable. Now, it is. They must have gone through a lot of effort to implement it, and then reimplement it. If you look at the ribbon in the many MS products, they are all subtly different, and differently "quirky".
| Where did you come from? Part 2 | Saturday, 12 January 2013 |
Of course, 30mins after posting that, I resolved the problem.
Definitely a case of being short sighted and not remembering how my own code worked.
When a tooltip pops up, the keyboard is grabbed, and we track the mouse - looking to see it move outside of the original area, so we can pop down the icon.
I assumed some magic was being used to drop the tooltip, but it was more basic than that. I had forgotten to look for the mouse moving outside the bounds of the image which gave rise to the tooltip. (I had nearly spotted the problem a few days back, but got side tracked as I locked the X server up, and hadnt spotted the "missing code").
The ribbon most likely wont make it in to the next release of CRiSP, as its not ready. A ribbon implementation is interesting, because of the behavior of the control, but also due to the various things that happen and can be embedded into a ribbon. (Popping up tooltips, complex tip-like helper windows, complete with embedded images, and a menu selector).
CRiSP does most of this now. (I have also been cleaning up the freetype font display for some of the controls, because getting the ribbon work was showing the ugly default fonts).
Now..off to see if this code even builds on Windows, as I have a couple of customer bugs to examine/fix.
| crisp.demon.co.uk -> CrispEditor-at-gmail.com | Saturday, 12 January 2013 |
The price was very cheap - in fact the price hasnt moved in the last 20 years or so.
Demon got bought up by other companies, went into the broadband business and have always been non-competitive in terms of pricing or technology.
In 2012, they moved from a POP3 mail service to IMAP. The POP3 mechanism and the design of the domains meant I could use as many mail addresses for any purpose I liked, and collecting email was very simple since they all mail sat in the same inbox.
With IMAP, this is no longer true. I cant even remember what mailnames I have used in the past, and having to try each one to poll for new mail is pointless.
Quite a few years back, I set up CrispEditor-at-gmail.com and many people use that. The nature of the IMAP service is so bad I may finally stop paying for the Demon account - the mail will bounce and the ftp hosting location will disappear. (I will find some alternate location or update the links, so people can still find CRISP).
The Demon IMAP interface is pretty - on a par with hotmail, but it is so badly thought out, and the migration to IMAP was so badly publicised or managed, that Demon has fallen out of my good books.
| Where did you come from? | Saturday, 12 January 2013 |
I have a prototype working - still more work to do.
I spend most of my live development on Linux - its easier and more convenient, using the tools. (Windows 7 build speeds are pretty impressive, and I use MINGW when I need to debug, using gdb, rather than Visual Studio).
Over the years, one thing has always annoyed me: programming paradigms where messages are received to effect an action or event, but where you have no idea what or why they are delivered, or where from.
The Xlib protocol is simple and of high performance, but due to the multiprocess nature of an X-Windows system, determining what is going on can be problematic. CRiSP has built into it an X11 message tracing facility (set the env var XDEBUG=1 if you are curious). Its very helpful to see what is happening.
The particular area causing me problems at the moment relates to tooltips. CRiSP has had for tooltips for an awful long time - floating text labels next to items in the GUI (tabbed windows, icons, etc). In the ribbon and image implementation, the popup of the tooltip, is immediately dismissed - the act of the popup causes a LeaveNotify event to be delivered, but I dont know where from. (Most likely the window manager, but finding out "why" is difficult). Bear in mind its the same subroutines which implement this - this effect doesnt happen elsewhere.
As with all difficult programming problems, most likely, I am looking in the wrong place. Hopefully I wont have to waste too much time on this.
| Why is Engadget's new web site so awful? | Friday, 07 December 2012 |
Gizmodo did this a while back and it too, is awful.
The thing wrong with Engadget (viewing this on a 1920x1080 Linux/Firefox combo), is that the fonts are huge - really huge (I'm guessing the titles for items is about 100 pixels high). Thats a waste of space.
Then, the images for each news item are huge - I can see only one news item (text) and image per screen display. And the images are not clickable.
Then theres all the other window dressing - bad combinations of font, color and contrast. It looks like it was put together by either a very short sighted person, or a 5 year old.
The m.engadget.com mobile web site by contrast is great (when it works; unfortunately on an Opera Mini, it keeps getting stuck at one of the "ad" web sites, so I have to switch from Opera to Firefox to read things on it.
On Gizmodo - the layout is equally awful - no real indication if I am reading the news from today or the news from 5 years ago, such is the diversity and lack of depth of the stories.
By contrast: slashdot - the style of news delivery has never changed; the recent web optimisations and revamp are "nice" - still not brilliant (reading threads on a mobile is an exercise in futility, and I still dont understand why most comments are not shown by default).
Why do IT web sites put about 100 bytes of information on a nearly 1MB web page? (Slashdot is good in signal/noise, as is theregister.co.uk).
I need to find an alternate to engadget/gizmodo, as they too annoying and waste bandwidth and CPU.
| iTunes 11 | Tuesday, 04 December 2012 |
What is it with iTunes 11? Did they not test it? What was wrong with the old iTunes?
Organising TV Shows and Movies (ones you have personally recorded, not downloaded ones)....So to keep track of what I watch, I would have iTunes refer to the files in my video folder. And delete the files from disk and remove from iTunes.
Well on iTunes 11, and I havent worked this out yet...you cannot delete a Movie (which may/may-not be deleted from the hard drive). It quite happily puts the entry back again.
Yesterday, nothing worked and I have to force kill/restart iTunes - maybe it thought it had a dialog box to display, but I couldnt find it.
Today, theres no dialog, but it keeps putting them back.
And if I view the "Films" section - my films are not to be seen (so I can delete them), although the TV shows episodes/playlists are.
So - did they actually *test* this?
I dont believe they did.
| The Case for a Safe XCall | Tuesday, 27 November 2012 |
dtrace_xcall() is the function in the driver to do this. On Linux, this maps to smp_call_function() and friends. A CPU cross-call is an interesting concept - an ability for one CPU to make a function call on the other CPU. The use case is rarely needed, and if it is done, whilst breaking the calling protocol, you can lock up one or more CPUs or crash the system.
On Linux, the cross-call (or IPI, interprocessor-interrupt), can be seen by examining /proc/interrupts and looking for a line like:
$ cat /proc/interrupts .... CAL: 52770 36972 Function call interrupts ...
My system has had many hours of uptime, yet the calls are rare (the above is showing the calls for each CPU).
When dtrace is called on to do a heavy action, like:
$ dtrace -n fbt::::
tens or hundreds of thousands of probes may be collected per second. DTrace has two internal buffers to log these probes, and the buffers will fill up quickly, and the calls to the IPI xcall code will happen a lot.
I've been pondering how this actually works - both on Linux and Solaris.
Lets take a thought experiment:
Imagine a dual cpu system. One processor is sitting inside a lock region, with interrupts disabled. The other CPU is trying to access the same lock/region. Now this other CPU is blocked until the first cpu exits the lock.
Now, imagine this again. This time, the first CPU takes a very long time to hold the lock. This would block the other CPU indefinitely. Normally this is rare - the kernel arranges to never hold locks for long periods of time.
Now, lets modify this scenario. One cpu is holding on to a lock, interrupts disabled, and we do an IPI cross-call. The other CPU is holding on to a different lock and has interrupts disabled. The first cpu cannot interrupt the other CPU and so we deadlock. In normal scenarios, this mutual exclusion cannot happen (other than bugs in the kernel or drivers).
IPI interrupts are just like normal interrupts - they can be ignored when interrupts are disabled, and processed when interrupts are reenabled.
The kernel smp_call_function() call has a contract: it must not be called with interrupts disabled. Doing so generates a kernel log/BUG warning, and indicates the kernel could deadlock.
When we use DTrace, we can place a probe on any function in the kernel, especially functions which run with interrupts disabled. This means we break the contract.
(I note Oracle UEL Linux DTrace simply calls smp_call_function() and suffers the bug, unless they have fixed it). In my DTrace, I take steps to avoid calling the Linux smp_call_function() and implement my own. It *seems* to work.
Whilst examining Xen, I had great difficulty finding a way to avoid smp_call_function() so someone invoking fbt probes with interrupts enabled can cause deadlocks or long live locks. (DTrace will detect a mutual deadlock and break the lock, but this is horrible and can panic the kernel in some extreme circumstances).
DTrace is supposed to be reliable and the above behavior is horrible. Simple *HORRIBLE*.
I have a (new) workaround...see below.
But, why doesnt Solaris have this problem? Well, the Solaris xcall code is intimate with the kernel interrupt code, and a CPU waiting in a xcall runs with interrupts enabled. (The whole Solaris/BSD kernel uses interrupt priorities to allow much of the kernel to run with interrupts enabled and even when interrupts are disabled, deadlocks cannot occur).
I wish I understood the above paragraph more - but experiments, user success stories demonstrate that Solaris has no deadlock issue.
Ok, so the solution for Linux.
If you followed the above carefully, you will note the problem is caused if we try to do a cross-call whilst interrupts are disabled. And interrupts are disabled either when probing a function in interrupt handler code, or inside a locked region.
So, lets disable probes whilst interrupts are disabled. If we did this, then the kernel should be safe for fbt::*: probes and never deadlock. Most interesting scenarios are in the non-locked kernel regions.
But thats bizarre! How could we do this? That defeats one of the deep probing aspects of DTrace.
Well to resolve this conflict of interest - we can have DTrace run in 'safe' mode by default, and when the user wants to remove this safety barrier, they can do so, by sending a message to the driver.
And this is what I am going to do for the next release.
| Xen progress | Sunday, 18 November 2012 |
Life has been difficult because the key to the issues resolve around paravirtualising certain instructions, but also, the way interrupts are handled. The normal interrupt routine sequence doesnt work for INT1, INT3 and PageFault interrupts.
What Linux does is provide two distinct interrupt routines - one for normal hardware, and one for Xen. At a low level in the IDT handler, it decides which one to use. (This is buried in the Xen handler for write_idt_entry()).
Part of the work is to have a similar mechanism - autodetect if we are on a Xen host, and use the correct interrupt handlers. Fortunately, the code in intr_x86-64.S is amenable to parameterisation via the macro assembler, so the code for all the interrupts is one macro, with some conditional assembly.
Another problem area is that when the system dies, hard, Xen is very unforgiving and reports an issue but no easy way to diagnose, in simple terms what happened. (Before modifying the page fault handler to use the correct Xen calling sequence, Xen would kill the guest due to issues in the page table; this appears to be bogus, and not the true cause of the issue - that the interrupt stack wasnt correct).
I now need to merge the Xen changes back to the mainline code, and check it still compiles/works on the older kernels.
| Xen progress | Saturday, 10 November 2012 |
Unfortunately, knowing what "right" is, is difficult!
BTW, I want to complain about google.com. Here is a fascinating in depth article, written by authors of VMware, which gently leads you through how VMware works and VM monitors in general.
web.mit.edu/6.033/www/papers/agesen.pdf
My complaint is that google wraps all links, so you cant just take the URL of a page you jumped to, but need to copy the link from the results page (which is not in a form that is an http embeddable link).
The article above hinted at a problem I was seeing in getting FBT to work *reliably*. FBT uses the INT3 and INT1 interrupt traps. I have had to rework the interrupt handles to be CONFIG_PARAVIRT compliant (and need to rework again so that the code works on a non-CONFIG_PARAVIRT kernel). Anyhow, a dtrace like:
$ dtrace -n fbt::sys_*:
would run for about 30-60,000 syscalls, and then a problem would arise. When doing FBT, we replace the instruction with a breakpoint instruction (INT3), single step the replaced instruction, and then resume execution.
What appears to happen is that occasionally the single-step trap would not fire. The copied instruction, which we single stepped, would continue execution after the copied instruction..resulting in a kernel page_fault. Now this is strange, because in the copy-buffer, we have:
original-instruction, nop
The nop should never be executed because of the single-step mode; placing a NOP after the valid instruction seems like a good practise, (rather than random junk) because otherwise the CPU may fetch ahead and try to decode a junk instruction, even if it is not executed.
I tried the following:
original-instruction, nop, nop
Two nop's after the instruction, and it ran much better; the first time, it got to nearly 1,000,000 traces; I then killed it, removed some of my debug, and ran again. Whilst writing this article, it got to about 750,000 traces, but the same thing happened. Heres the dump from the kernel:
[26881.316402] Call Trace: [26881.316412] [<ffffffff81664a82>] ? system_call_fastpath+0x16/0x1b [26881.316417] Code: ff ff ff ff 00 00 00 00 00 10 00 81 ff ff ff ff 00 10 00 81 ff ff ff ff 55 6e 10 00 03 8e 03 a0 ff ff ff ff 00 00 00 00 55 90 90 <00> 00 ...
Opcode 55 is "PUSH %RBP"; NOP is 0x90. The instruction after the second 0x90, is an instruction which causes a kernel page fault. So, the cpu went marching on ahead and fell over...despite the trap flag being set.
At the moment, this is weird, and it looks like Xen is not honoring an IRET (or the hypervisor call equivalent) everytime, and ignoring the single step mode.
Now off to do more research ... maybe its a known bug in Xen, or maybe I have honored one of the rules (but the "rules" arent written down anywhere :-) ).
| Xen on VirtualBox... | Monday, 05 November 2012 |
So, now I have Ubuntu 12.04 running inside VirtualBox, and inside that, I am now creating a Xen guest running Ubuntu 12.04 (yes, thats 12.04 inside 12.04 inside 12.04). Its kind of mind boggling but hopefully I can try and debug the dtrace issues.