<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
<?xml-stylesheet type="text/css" href="http://www.crisp.demon.co.uk/blog/styles/feed.css"?>


<title type="html">CRiSP Weblog</title>
<subtitle type="html">technical projects, CRiSP, dtrace and other stuff</subtitle>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog"/>
<link rel="self" type="application/atom+xml" href="http://www.crisp.demon.co.uk/blog/atom.xml"/>
<updated>2009-11-20T21:46:30+0000</updated>
<author>
<name>Paul Fox</name>
<uri>http://www.crisp.demon.co.uk/blog</uri>
</author>
<id>http://www.crisp.demon.co.uk/blog</id>
<generator uri="http://www.crisp.demon.co.uk/blog" version="1.0">
/home/fox/bin/blog.pl
</generator>

<entry>
<title type="html">Tail recursion woes</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-11.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/11/index.html</id>
<published>2009-11-20T21:46:28+0000</published>
<updated>2009-11-20T21:46:28+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<a href='http://www.ddj.com/cpp/184401756'>Dr. Dobbs article</a>
<p>
This is driving me nuts. GCC does tail recursion optimisation. That
is very nice, and means that if we have something like:
<pre>
int func(...)
{
	do-stuff();

	func2(...);
}
</pre>
then the func2() call can be converted into a 'goto'.
<p>
The problem is that this means that if we put a breakpoint in gdb,
or a printf, in the func2, we lose the stack frame for func(), and it appears
that func2 is being called from the caller of func(), rather than func()
itself.
<p>
I wish they wouldnt do these "nice" things. Makes debugging a pain, and
am tempted to go back to earlier/very old versions of GCC to stop this
warfare, where "what used to work", stops working and you have to fight
the toolchain.

</div>
</content>


<entry>
<title type="html">Motif finishing up</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-11.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/11/index.html</id>
<published>2009-11-20T20:50:26+0000</published>
<updated>2009-11-20T20:50:26+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
After my last write up, of finishing the Motif rewrite for CRiSP,
I have made more progress. This centers around the things I had forgotten.
<p>
For example, the 'Scale' widget which is used in the color selector
dialog needed to be implemented (from scratch). That took about a day
(nice when things go according to plan).
<p>
Then I hit some issues with the default size of a combo field. That
is fixed.
<p>
Next up was the protocol manager for a shell widget. What is that you say?
Think of XmAddWMProtocolCallback(). Without this, if you click on the
window manager "X" at the top right of a window, then your app is quickly
terminated (the TCP connection to the X server is severed).
<p>
Took me a while to figure out / remember how this all worked. But, suffice
to say, that unless you post an Atom/Property on the window (WM_DELETE_WINDOW), then
the window manager will not ask politely, but just brute force the 
termination of the app. 
<p>
But to put a property on the application shell window is not quite so easy,
especially when we normally call the XmAddWMProtocolCallback() against a
*widget* and not a *window*. A widget may not exist on screen at the
time we call it, and that is why there is complexity in the Motif
library - you are allowed to register interest in these window manager
protocol messages, before the window is 'realised' (i.e. before being
mapped to the screen). When the window is mapped, the appropriate
property is posted on the desktop to allow the window manager to see
what is going on.
<p>
Of course, if you try to do something "later" in an application means
creating some form of data structure for later use, and then, making sure
we dont suffer a memory leak.
<p>
My implementation of window manager protocols isnt perfect, but sufficient
for what I need.
<p>
Why do I bother? I dont know, but what I know is that CRiSP with Motif, statically
linked, occupies 3MB of code memory. With the new Motif replacement, it
is about 2MB.
<p>
When CRiSP was first written this was about the size of a large floppy
disk (and 40MB - thats megabytes, not gigabytes) was huge. Now the
size of CRiSP fits in the L2/L3 cache of a cpu.

</div>
</content>


<entry>
<title type="html">Menu updates</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-11.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/11/index.html</id>
<published>2009-11-15T18:59:44+0000</published>
<updated>2009-11-15T18:59:44+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
My experiments to replace the Motif library code in CRiSP with native
code hit a major stumbling block. The first few days of effort were
extremely good - lots of progress and deterministic behavior.
<p>
But the menuing code has taken about 3 months - I hope this is now
complete. Why?
<p>
Any number of reasons:
<ul>
<li>I am losing my 'touch'</li>
<li>It is difficult</li>
<li>There are lots of fiddly bits to get right</li>
</ul>
<p>
You choose. The issue with menus is the way input focus moves around
from one widget to another. There are lots of scenarios to get right,
and fixing one of them, would result in some existing feature suddenly
breaking - like a see-saw as the code came together. 
<p>
One problem I hit is that XtAddGrab/XtRemoveGrab doesnt handle double
registration of a widget particularly well.
<p>
The code, whilst trying to be purist and object oriented, in the end
had to be a little dirty - one class having too much intimate knowledge
of what it is dealing with (a menu has menu items, which is mostly separators and 
buttons, for instance).
<p>
Heres some scenarios to consider:
<ul>
<li>Click and reclick the menu bar button (should dismiss menu)</li>
<li>Use keyboard to navigate a menu and popup sub menus, and then popdown submenus. </li>
<li>Use ESC to popdown a menu</li>
<li>Click outside the menu to dismiss it; click in another window to dismiss it too</li>
<li>Click on a menu item, and have the menu popdown, and the callback invoked</li>
</ul>
<p>
May seem like simple stuff, but getting it all working is difficult, especially
when grabs are put in place - suddenly input goes to the wrong widget, and
everything you had working, stops working.
<p>
Why bother ? Why not just use Qt or Gtk ?
<p>
Because I dont want to use them. As good as those toolkits are, they are
not available everywhere, and I dont want dependencies on other toolkits -
toolkits which have a very active development life.
<p>
This is akin to the dtrace problem: do you develop software for the
latest and greatest kernel/distro out there, or do you go back
to old releases and ensure your software works with them?

</div>
</content>


<entry>
<title type="html">Menus...implementing them (ipod web app)</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-10.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/10/index.html</id>
<published>2009-10-19T22:51:07+0100</published>
<updated>2009-10-19T22:51:07+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Still busy working on my Motif replacement for CRiSP. Much of
the grunt work was done a long while ago, but menus were the
thing I put off until a couple of months back.
<p>
Implementing menus is frustrating. (The input field widget took
about 2 days to implement from start to finish, and the result
is more functional than what it replaces). By contrast, menus
have taken a couple of months. Much of the original code worked u
front.
<p>
What makes menus tricky are a number of things but mainly related to input
focus and 'appearing to do the right thing'. Menus can popup with
a mouse, and then be navigated into sub menus with the mouse or keyboard.
I delayed getting the input focus problems solved until late in the
implementation, which, in turn, broke much of the existing logic.
(Events firing to the menu button popping up the menu or the
menu or menu items, etc). Just as bad was restoring the status quo
when selecting a menu item or dismissing the menu.
<p>
To make life palatable, I have a nice suicide-timer - after 25s, a
child does a "kill -9" on the parent, which avoids the tedium of trying
to unwedge the X server.
<p>
Its all very close, and am just busy ensuring all combinations work properly.
(Mainly nested menus as they popup/get dismised).
<p>
Many of the problems are my own misunderstandings: I spent a good few
days trying to get XGrabPointer() to work for me, only to eventually
realise what I needed was XtAddGrab(). [Someone on a forum mentioned
that if you weren't confused by XGrabPointer/XGrabPointer, then you
didnt know what you were doing!]
<p>
XGrabPointer is to do with freezing the X server, typically used
for drawing type apps. For menus, you sort of need a mixture
of the two. (You want to intercept a menu dismissal when you click
outside of your own application).
<p>
(I was debating using the iPod Touch to fix my X server hanging problems,
e.g. have a web server running, which responded to trivial HTTP requests,
and on receipt of a request, could douse the rogue hung application).
<p>
Ok, here it is - an ipod touch web server to help debug X11 apps...
<p>
<pre>
#! /usr/bin/env perl                                                          
                                                                              
use strict;                                                                   
use warnings;                                                                 
                                                                              
use IO::Socket;                                                               
                                                                              
sub main                                                                      
{                                                                             
   my $sock = IO::Socket::INET->new (                                    
      LocalPort => 8080,                                                 
      Type      => SOCK_STREAM,                                          
      ReuseAddr => 1,                                                    
      Listen    => 10);
   while (my $client = $sock->accept()) {                                
           print "Killing hung app...\n";                                
           my $str = `ps ax | grep /home/fox/crisp_v9.5/bin.linux-x86_32/crisp`;
           chomp($str);
           next if !$str;
           $str =~ m/^ *(\d+) /;
           next if !defined($1);
           kill(-9, $1);
   }
}
main();
0;
</pre>

</div>
</content>


<entry>
<title type="html">dtrace update</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-10.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/10/index.html</id>
<published>2009-10-10T15:55:23+0100</published>
<updated>2009-10-10T15:55:23+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Not much to report; i have hopefully fixed compile issues on 
2.6.31 kernels (havent proved on 2.6.32). It does get tiresome the
tweaks from one kernel to another to prove it compiles properly.
<p>
Some people are reporting issues on GCC 4.4 and later glibc's. If
you get these issues, let me know. glibc seems determined to break
existing apps (GCC isnt quite so bad).
<p>
User space stack tracing is probably hosed because getting the user
space stack, depending on the trap context is fiddly and something
I hadnt finished.
<p>
Need to catch up on my crisp Motif work (it all works, but its not
quite perfect enough to release the crisp code yet).

</div>
</content>


<entry>
<title type="html">Boo hoo...</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-10.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/10/index.html</id>
<published>2009-10-06T23:17:50+0100</published>
<updated>2009-10-06T23:17:50+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Hard drive on my ftp server has gone - so that means some
links and crisp downloads are out of action til I have a chance
to put the backup into place and populate it with something useful,
like, err, maybe an operating system.
<p>
As a BTW, I decided to implement 'inertial scrolling' on fcterm and crisp.
So, you may find (eventually) a new release with the feature where
the faster you scroll the wheel on the mouse, the faster it scrolls.
(Works nicely on fcterm, but may need to do more surgery on crisp since
it only works for X windows, and needs to work for Mac+Windows too).

</div>
</content>


<entry>
<title type="html">More Apple Insanity</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-10.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/10/index.html</id>
<published>2009-10-04T10:14:54+0100</published>
<updated>2009-10-04T10:14:54+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I wrote a short while back about the iPod Touch - nice hardware,
shame about the software. I will continue my rant...
<p>
Sometimes, my movie playlists will play back to back, but mostly
not. Its totally erratic, and simply feels like a 1st grader programming
error. Its almost like an uninitialised variable is causing it to work
or not. Sometimes switching it on/off helps, or a resync, but who knows.
<p>
Then, the other area of total insanity is the app store. The
app store integration is brilliant - ability to browse, and waste
a few minutes seeing whats available is great. Then you can download
free software or pay for software.
<p>
But...why on earth does the ipod go into a mode where *no* applications
will run? The startup screens almost show but then the app aborts or exits.
Who knows what - because Apple decided not to show you a reason
why the app quit. Then, some time later, they will all work again.
This is clearly a bug in Apples firmware. I have downloaded maybe
15-20 apps, and I can believe some have bugs, but not on the initialisation
of the app. Why it should go into "not going to run anything" mode and
then recover, I dont know.
<p>
I downloaded an RSS reader - great software, but I am fearful it
creates a lot of little files for the unread news - maybe thats tickling
a bug; maybe the filesystem is fragmented - you cannot see the filesystem
so you are SOL to know whats going on.
<p>
What is annoying is that lots of people on the web report the same thing
and as far as I can tell, noone has clued up on the real causes, and
Apple doesnt admit to this issue. Their 'walled garden' is full
of street-corner muggers.
<p>
At least it plays movies and music - my main desire, but the app
crashing is unforgivable. At least, some form of diagnostic tool
to figure out what would be nice. Maybe thats a tool that needs to be
written, or maybe I will find out on google somewhere what is going on.
<p>


</div>
</content>


<entry>
<title type="html">The year is 1992</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-09.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/09/index.html</id>
<published>2009-09-26T23:29:31+0100</published>
<updated>2009-09-26T23:29:31+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Around about the year 1991, I acquired my first HP calculator - 
an HP48SX. If you have never seen or used one of these calcs, read on.
<p>
It was the equivalent of an iPod, in its day.
<p>
Now armed with an ipod touch, and visiting the ipod store and 
getting a feel for this new device, I came across an HP48 calculator
emulator for the ipod.
<p>
Lets go back to 1992. In 1992, I wrote an emulator for the HP48 - made
available over Usenet, and, it worked - it could do most HP things.
It was based around X11, and the PCs of the day, were very lowly, but,
still, the calculator emulator was fast enough.
<p>
I still have the source code to this. I believe this code was taken
and used as the basis for other calc emulators, and there are now
excellent emulators for Windows (havent checked Linux, but should be easy).
<p>
Now, lets go back to 2009. The App store version of the emulator
is great - full power of an advanced RPN graphing and symbolic
algebra calc. The pixels on the ipod are just enough to do just, but only
just.
<p>
This app is available as source code.
See here for the announcement of the
<a href="http://www.automagic-software.com/products/i48/">i48 app</a>
Github entry for the source is 
<a href="here">http://github.com/dparnell/i48/tree</a>
<p>

This is interesting for two reasons. One, this is a simple iPod app,
and hence, is a useful example of how to create an app. Complete with
XML files and bitmaps for the app store. (Interestingly, the Objective-C
code is tiny, which is a testament to a good api on the ipod touch).
<p>
The other interesting point is that the emulator code bears the hallmark
of my original donation to the community. I dont know if this code was
based on my original or not - I started to lose interest in the HP
as CRiSP took more of my time in the early 90's, and was vaguely
aware of a Windows port (bear in mind this would be Windows 3.x or Windows 95,
which I detested beyond belief - you needed to reboot the system if any
app crashed if you wanted a nice life).
<p>
The attributions in the code date back to 1994, but there
are signs of similarities. I cannot remember how I got started on
the emulation. I know I picked up a possible HP28 (predecessor to the HP48)
emulator and got some internal docs from HP at the time (I probably still
have the emails dating back then), but its nice to know this excellent
piece of hardware and the emulators live on - and now, I can carry on
with carrying an HP + IPod together.
<p>
Ob complaint: why is the HP50 so expensive in the UK? Its more than 100 GBP
vs about $90-100 in the US. Needless to say, HP havent received any money
from me because of this obscene pricing, and I suspect, the number sold in the
UK is pitiful, which is a shame.
<p>
So, what next? Who knows. I would like to write something for the
ipod, but I dont have time, and theres a lot of ideas to wade thru
from the existing app base on the app store. I just wish the ipod
wasnt so locked down (see complaints in the prior blog).

</div>
</content>


<entry>
<title type="html">Apple are insane</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-09.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/09/index.html</id>
<published>2009-09-26T00:11:12+0100</published>
<updated>2009-09-26T00:11:12+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I have a new ipod touch - nice device. There are lots of nice things
about it, but Apple are totally insane. How they let this out of the
labs I really dont know.
<p>
(1) Nested folders (playlists) dont appear to work on the ipod touch.
<p>
(2) On the touch front screen, is a "Movie" button. Search the web
to find out how hard it is to get anything to appear in here. On
the touch, movies need to be specified as "Music Video" or "TV Show".
What? Duh?
<p>
(3) If you make it a "Music Video": a. it appears in the music playlist, not
the video one. Are they insane? Yes. I have 92 playlists for music and
upwards of 100 playlists for films (can only fit 20-30) on the ipod.
So, i have to scroll thru the music to find the films, unless i use
an initially letter, e.g. "A" to group them together. Are they insane? Yes.
<p>
(3b) A music playlist can be played portrait or landscape. A TV Show
can only be played portrait. This wouldnt normally matter except in
portait mode, you can see the film title when the popup volume
controls appear but not in landscape mode. Are they insane? Yes
<p>
(3c) A TV Show playlist will not continue from one part to the next.
I record from the tv to DVD across to a PC in 5min fragments. A typical
film is 20 5min fragments. So, after each 5min fragment, my tv show
skips back to the contents screen instead of continuing on to the next
episode/fragment. All the parts of the film are in a playlist. So, Apple,
are you insane? Yes.
<p>
(4) I knew you could get an onscreen control on the ipod screen to
see position/volume controls, and it has taken me absolutely ages to 
know what to do - tried tapping, double tapping all parts of the
screen, but it was so unobvious. Are you insane Apple? Yes.
<p>
(5) ipod touch wont charge on some alarm clock devices which let you
plug in an ipod. The new 3G touch wont use my expensive and overpriced
remote control from the ipod classic. Apple can be so money grabbing - they
are a business, after all, but now, with so many ipods out there, and so
many 3rd party devices, it is totally unclear what can work with what.
<p>
(6) App store apps which crash. I can appreciate software has bugs in it,
but when I run an app, which then crashes, I really would like to know this
and not have the ipod return back to the menu screen with no clue about
what happened. What happened to the "bomb"?
<p>
(7) The ipod touch screen gets too smudgy; I can live with that.
The screen is too dark - an OLED would be nice, but, watch a film
set in a dark room, its almost impossible to see what is happening.
To change brightness requires too many actions. Without a fast-fwd
or reverse button, skipping over adverts is painful - trying to get
your fingers in the right place (assuming you can work out how to popup
the on screen display).
<p>
(8) Why are the classic and ipod touch/phone so different - I mean the
way iTunes treats them. This is insane. They had a very good GUI and
linkage with iTunes, but on the touch, its "lets be as different as we
can". I could go into more details but.
<p>
(9) My ipod touch is much louder than the classic, for which i have
spent so much on trying different headphones so I could hear quiet
passages of films in a noisy environment.
<p>
(10) iTunes (9.0 and 9.0.1) is *insane*. Plug in your ipod. With
2000+ 5min film fragments, visit the device 'Movies' folder. It
wants to open every one to display a screen shot image so you can select
what to sync. iTunes mushroomed to greater than 1GB of RAM, and spent
cpu cycles like no tomorrow. No way to turn off the image display.
Fortunately, the "TV Shows" tab doesnt do this. Insane.
<p>
(11) ipod classic shows up as a mountable filesystem. ipod touch doesnt.
iTunes/Apple have pulled a fast one so you cannot see the device as a mountable
filesystem. I presume this is where the touch/phone cracking utilities come
in. They have made it hard to do certain things, and "dtrace" (yes, dtrace)
isnt able to monitor some aspects of iTunes deliberately (Adam Leventhal
reported on this a while back, but dtrace is still broken). This is easy
enough to fix/work around, but am curious. This means fixing a broken
or ipod touch requires the service of an even better expert than a normal
ipod has, and 3rd party tools become fewer and further between.
<p>
(12) Microsoft released the Zune HD. Lets hope the Zune really competes
with the ipod, because Apple need a kick to be innovative. Apple stopped
being innovative when they opened the Apple Store.
<p>
Despite my rant above, I am happy with the touch, but it has a long way
to go to ensure it works for what it was intended for (music and films).
<p>
BTW there is a big difference between a classic and a touch: if your library
fits into a classic or touch, life can be easy. If it doesnt, and it wont
on a touch, then micromanagement of files is insanely tedious and difficult
with iTunes. They have made the classic mistake of creating a GUI
and touting how easy it is to use. It isnt. You are simply mortal, Apple.

</div>
</content>


<entry>
<title type="html">Trials, tribulations, horrors...</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-09.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/09/index.html</id>
<published>2009-09-13T19:10:24+0100</published>
<updated>2009-09-13T19:10:24+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<b>Horrors...</b>
<p>
I've been on holiday (San Francisco, LA, Vegas...), and my first
trial is the MGM Grand Hotel. They pulled a fast one. Their customer
services leave a lot to be desired, and their web site is appallingly
awful.
<p>
We booked a 3 day stay, and the price was very good. ... Til we checked out and then
we found their prices had doubled and doubled again (we arrived Thu
before Labor day and stayed the Fri + Sat night). Nowhere on the
web booking did it say that each night was a different price.
What is worse...when we arrived, they got me to sign the obligatory
credit card form, and hiding (in plain site), was the room rate
for each night, going up in exponential fashion. So, partially
my fault for not noticing that, but the person (might not have been
human, and cannot put down the word I want to describe) didnt point
this out. (It would have been too late anyhow).
<p>
Other than that - holiday was great.
<p>
<b>Trial</b>
<p>
I upgraded my main server today to Ubuntu 9.04 - just a short time
away from the 9.10 release, but I wasted much of the day
getting VMWare Server 1.0.x working on the 2.6.31 kernel. (I gave
up with the Ubuntu kernel after it wasnt installed properly after the
upgrade). VMWare is a pain - at least there is source code to the
drivers, and eventually I got it to compile, but generated a kernel
panic when vmware was started (more than likely, my code changes
were a little too dirty). Oh well....
<p>
<b>Tribulations...</b>
<p>
So, I decided to try switching back to VirtualBox (2.1). On startup,
it told me 3.0 was available so I have upgraded to that. Interestingly,
I noticed that 'rdesktop' works nicely for VirtualBox (my previous
complaint was that the X GUI was horrible when dealing with high volume
output across a low speed wifi network), and this may help enormously
solve that problem, and get me away from VMWare. I really didnt
want to suffer the VMWare Server 2.x release, and now I can live
in the freeware world for virtualisation.
<p>
<b>DTrace for 2.6.31 Kernel</b>
<p>
As per normal, the new kernel doesnt compile the dtrace code due to the
number of changes in the kernel. Fortunately, the changes look much
easier than the ones VMWare had to contend with, so will try and fix this
shortly.
<p>
<b>CRiSP without Motif</b>
<p>
What has been occupying me for the last few weeks was migrating CRiSP
away from Motif - its nearly finished - just need to do some final touches
to the menu system and menu bar, and it looks/feels much better.

<p>
<b>iPod Touch</b>
<p>
Had been eagerly waiting for the new ipods to come out (I own an iPod Classic,
but not an iPhone - they are simply too horrendously expensive), and
although the new iPod isnt technically much better than the older ones,
have ordered one - so I can watch films on the way to work (the Classic
screen size and volume has always caused me an issue). I am thinking of
toying writing some apps for the iPod....maybe I could port CRiSP or
DTrace to it :-) (But that might be pointless tho!)
<p>
So, if things go quiet for a while...I am busy getting a high score or
just hacking on the ipod or crisp ... or dtrace...


</div>
</content>


<entry>
<title type="html">CRiSP + Motif (no dtrace)</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-08.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/08/index.html</id>
<published>2009-08-15T18:48:40+0100</published>
<updated>2009-08-15T18:48:40+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I am taking a short rest from dtrace - its been doing my head
in (ustack / dwarf; see previous postings).
<p>
Am on holiday from next weekend for a couple of weeks, and I
want to do something more rewarding, so am switching back to CRiSP
for a while to kick some tyres.
<p>
First up is more finer control of file auditing - you can tell
CRiSP to keep track of files you edit in an audit trail; useful
for those times when you forgot where you placed a file.
<p>
I've fixed some other customer reports.
<p>
I keep on staring at ribbon bars, and before I fully tackle this
(theres some pre-alpha code in CRiSP to do this, but its not ready
for primetime), I am revisiting the Motif factor. CRiSP is built
on Motif and over the years, it has driven me insane. In recent
weeks I have fixed some uninitialised memory refs in Motif which
could cause core dumps, but I have always had a goal to remove it totally.
Many of the widgets are native Xt widgets, and the few remaining just
require a bit of debugging to get rid of it totally - thus making
the code more supportable, and ready for other things. (And freeing up
a fair amount of memory).
<p>
CRiSP has some theming support and in getting rid of Motif, it will be
easier to complete that, and finally make menu items to have icons in them.
<p>
People have also asked for freetype font support (which exists in CRiSP
in a semi undocumented fashion). So, if the Motif removal goes well,
then freetype can be made available to most of the widgets.


</div>
</content>


<entry>
<title type="html">Painful dwarf</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-08.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/08/index.html</id>
<published>2009-08-09T00:23:41+0100</published>
<updated>2009-08-09T00:23:41+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Progress is slow, but positive. Ive spent the last week
or two trying to find the user stack and the PC. Its easy to get
the user stack, but the PC proved elusive, but I have a hack to find it.
<p>
Why?
<p>
Imagine the SYSCALL instruction fires. This is a special instruction
in the amd/x86 cpus which moves from user mode to system mode,
*without* pushing the return address on the stack. The Linux kernel,
immediately after the transition (entry_64.S) puts the user space
SP into the thread task area, but the PC is hiding. On entry to the
kernel side of a syscall, it is in the RCX register, but by the time
we hit a probe, e.g. sys_open(), we are miles away and the pt_regs
array isnt accurate.
At the point of probe, we force a breakpoint trap (luckily, only
our code executes at this point, so we dont have to consider
nested interrupts and blowing the state areas in the thread stack).
<p>
What makes this tricky is getting everything to work at once - anything
even slightly wrong just gives bogus results -- stack traces which are
not accurate or totally missing.
<p>
I am better now - I seem to get the first two stack frames, but
the third one is elusive (I am either miscomputing the dwarf frame info
or misapplying the result to find the next frame; for a third frame, its
frustrating since we have gone thru the same looped code twice,
so why the third is problematic is not clear).
<p>
The code so far is fairly horrid, with lots of experiments in their,
and no 32-bit version yet done. My biggest fear is if any of this
is subtly dependent on kernel releases (I think it is not), so that
would be one weight off my chest.
<p>
(Kernel releases are subtly different in syscall/interrupt handling,
and also structure layout for the user/process/thread, but I dont
think we care too much, yet).

</div>
</content>


<entry>
<title type="html">slow dwarf</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-08.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/08/index.html</id>
<published>2009-08-05T23:47:50+0100</published>
<updated>2009-08-05T23:47:50+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Been busy doing some CRiSP updates over last few days, so backed off
a little on dtrace, but trying to get back into the dwarf issues.
<p>
Alas, the current Windows CRiSP release has black arrows on the scrollbars...
to be fixed this weekend. Nuts.
<p>
I am trying to get this to parse properly:
<pre>
$ build/dwarf /lib/libpthread.so.0
....
CIE length=00000014
  Version:              01
  Augmentation:         "zRS"
  Code alignment factor: 1
  Data alignment factor: -8
  Return address reg:    0x10
  Augmentation Length:   len=0x01 1b
R encoding 1b (kernel)

2c38 FDE len=7c cie=001c pc=e0ff..e109 tpc=ffffffffffffffff
0000: dwarf.c: unsupported DW entry 0xf 12
</pre>
I am working thru the various opcodes, being able to parse, but no
guarantee the semantics are correct (thats the next phase).
<p>
libpthread.so.0 is where the open64 syscall is located when I do
my ustack() test against the perl interpreter.
<p>
In theory the parsing shouldnt matter, as in the kernel, we skip
over blocks of the dwarf instructions to find the matching block,
but it helps me to relax a little and better understand this stuff
so I can tackle why some SYSCALL instruction blocks arent being
handled properly.
<p>
People are sending me bug reports on 2.6.30.* kernels (fixed an issue
with 2.6.30.4, but now theres a 2.6.30.5 - I cannot keep up with these
releases and the gratuitous kernel code changes on each release!).
So, just trying to stay above water, but progress is slow.

</div>
</content>


<entry>
<title type="html">mail problems</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-08.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/08/index.html</id>
<published>2009-08-04T20:45:47+0100</published>
<updated>2009-08-04T20:45:47+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
for reasons i dont fully understand, some of my mail is not getting
out. my mail macros and bits/pieces are breaking in some areas and
i hadnt realised things were not getting out.
<p>
If you see no response from me, then this could be the issue - just
remail me; if you see dup emails from me, its me attempting to fix the
issue.
<p>

</div>
</content>


<entry>
<title type="html">dtrace linux status - the dwarfs</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-08.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/08/index.html</id>
<published>2009-08-01T13:13:59+0100</published>
<updated>2009-08-01T13:13:59+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I've been slowly getting the DWARF stack dumper to work.
It works for some system calls/probes but not for others.
At issue appears to be accuracy in the dwarf.c code - looking at the
gdb source for stack walking is interesting as it highlights a number
of issues, including trampolines and exception stacks. 
<p>
A particular issue I am having at present is the sys_open syscall.
gdb can show a stack trace but my kernel code cannot find the 
appropriate dwarf frames mirroring where we came from.
So I need to put in more effort to work through the use case scenarios.
<p>


</div>
</content>


<entry>
<title type="html">Dwarf .. nearly working.</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-07.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/07/index.html</id>
<published>2009-07-26T12:02:30+0100</published>
<updated>2009-07-26T12:02:30+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
...
  0   3004              sys_nanosleep:entry
              0x7f76eab2e104: libc-2.6.1.so`sleep+0x94
              0x7f76eb55a576: libperl.so.5.8.8`Perl_pp_sleep+0x56
              0x7f76eb51d1ee: libperl.so.5.8.8`Perl_runops_standard+0xe
              0x7f76eb4c7f4a: libperl.so.5.8.8`perl_run+0x30a

  0   2482           sys_rt_sigaction:entry
              0x7f76eab2e17a: libc-2.6.1.so`sleep+0x10a
              0x7f76eb55a576: libperl.so.5.8.8`Perl_pp_sleep+0x56
              0x7f76eb51d1ee: libperl.so.5.8.8`Perl_runops_standard+0xe
              0x7f76eb4c7f4a: libperl.so.5.8.8`perl_run+0x30a
...
</pre>
The above is the stack trace of Perl, which has no decent
frame pointers, yet the stack trace agrees with what gdb sees.
(I had to cheat, since 'main()' is missing above).
<p>
Its nearly there, but need to resolve some more issues, and then
we should have a viable ustack() call even on omit-frame-pointers
applications. (Still need to do the 32-bit equivalent of the above).

</div>
</content>


<entry>
<title type="html">Say "goodbye" .. Say "hello"</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-07.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/07/index.html</id>
<published>2009-07-20T23:25:12+0100</published>
<updated>2009-07-20T23:25:12+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I have removed the utils/eh.c file.
<p>
I have created driver/dwarf.c.
<p>
This file is both a userland binary (build/dwarf) and the dwarf
decoder subroutine for kernel code to be called from dwarf_isa.c.
<p>
Next step is to modify the stack walker to invoke the subroutine and
see if we get sensible results from within the dtrace driver.

</div>
</content>


<entry>
<title type="html">And so the gestation of a dwarf begins...</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-07.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/07/index.html</id>
<published>2009-07-19T19:02:25+0100</published>
<updated>2009-07-19T19:02:25+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
The utils/eh.c seems to be working and am now converting it
from a userland dwarf dumper to a subroutine which can be called in the
context of walking the stack. 
<p>
I'll put out periodic releases if anyone is interested (utils/eh.c) which
will become driver/dwarf.c when its ready for compiling into the kernel
(not far off).
<p>
The next step is to change the ustack() code to call this and see
what happens...

</div>
</content>


<entry>
<title type="html">Gestation Period is up...I am pregnant with a Dwarf...</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-07.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/07/index.html</id>
<published>2009-07-18T19:41:55+0100</published>
<updated>2009-07-18T19:41:55+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Having spent the last week or so on understanding
the DWARF .eh_frame and .eh_frame_hdr sections, I now have a simple
utility to dump out these sections, according to the DWARF spec.
This code is analagous to what the binutils/readelf tool can do,
but is the first step to making this work inside the kernel
to get stack traces from user space apps.
<p>
The code is in utils/eh.c (gcc -o eh eh.c -lelf). Its nothing
special, and likely to have a few bugs/quirks in it, but the
code can now be copied into a kernel module and invoked as a subroutine,
with various changes to handle ELF32 + ELF64 (eh.c only handles
ELF64 for now).
<p>
The following is the kind of output from the tool:
<pre>
FDE length=00000024 ptr=0034 pc=00402110..00402199
fde_encoding=27
  Augmentation Length: 0x00
0000: 4a          DW_CFA_advance_loc 10 to 0040211a
0001: 8f 02       DW_CFA_offset: r15 at cfa-16
0003: 86 06       DW_CFA_offset: r6 at cfa-48
0005: 66          DW_CFA_advance_loc 38 to 00402140
0006: 0e 40       DW_CFA_def_cfa_offset: 64
0008: 83 07       DW_CFA_offset: r3 at cfa-56
000a: 8e 03       DW_CFA_offset: r14 at cfa-24
000c: 8d 04       DW_CFA_offset: r13 at cfa-32
000e: 8c 05       DW_CFA_offset: r12 at cfa-40
0010: 00          DW_CFA_nop
0011: 00          DW_CFA_nop
0012: 00          DW_CFA_nop
0013: 00          DW_CFA_nop
0014: 00          DW_CFA_nop
0015: 00          DW_CFA_nop
0016: 00          DW_CFA_nop
</pre>
It may not make sense without reading the specs or understanding
what it is trying to do. (eh.c has various big cribbed comments
taken from the DWARF spec). The above is like a virtual machine
but is used to track what is in a register (eg the current frame pointer)
rather than perform arithmetic or logical operations.
<p>
Theres still some way to go - taking a demo program and making it into
a re-entrant subroutine (and I may have some concerns about performance
after looking at the DWARF frames for a sizable executable, like CRiSP,
but we will see what happens).
<p>
My initial target is /usr/bin/perl - since having a programming
and deterministic environment to test and retest is useful.

</div>
</content>


<entry>
<title type="html">DWARF, and Sun</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-07.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/07/index.html</id>
<published>2009-07-14T21:30:55+0100</published>
<updated>2009-07-14T21:30:55+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I have a person in Sun, actively fixing dtrace to help with
their work, and this is proving useful - two or more sets of
eyes to pick over some of my dirty work. Already he has fed
back quite a few things for the 2.6.18 kernel, which is applicable
to other kernels too. Hopefully more fixes will be forthcoming,
whilst I fight the Elves and Dwarves.
<p>
DWARF - one of the most complex unix areas - but a beautiful
piece of work, dating back to the early 1990s by AT&amp;T/Sun.
<p>
DWARF is the way debug info is stored in executable ELF files.
Not something one normally worries about, and the GNU binutils
and gdb packages, along with GCC, know how to do this without
blinking.
<p>
But, hiding in DWARF is the magic for handling stack unwinding.
Because -fomit-frame-pointer became popular in the 1990s as
GCC was enhanced to allow use of an extra register on the x86
architecture, a way was needed to walk the stack, when the
%EBP register no longer helps find the return addresses.
<p>
If you look at an ELF executable, e.g.
<pre>
$ objdump -h /usr/bin/perl
Sections:
Idx Name          Size      VMA               LMA               File off  Algn
...
 15 .eh_frame_hdr 00000034  0000000000400eb4  0000000000400eb4  00000eb4  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 16 .eh_frame     000000ac  0000000000400ee8  0000000000400ee8  00000ee8  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
...
</pre>
you will see the above two sections. This is the sections for
unwinding the stack, typically needed for C++ exceptions, but also
for omit-frame-pointer (FPO) code. The DWARF spec, e.g.
<a href="http://refspecs.freestandards.org/LSB_3.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html">http://refspecs.freestandards.org/LSB_3.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html</a>
will tell you more than you ever wanted to know about this.
<p>
The specification, like most specifications, is opaque in many areas,
and I am busy writing a disassembler to more fully understand it.
(Not useful to anyone else but me). I did find this:
<pre>
$ readelf -wf /usr/bin/perl
</pre>
will disassemble these sections, and I found this:
<a href='http://www.hpl.hp.com/research/linux/libunwind/'>http://www.hpl.hp.com/research/linux/libunwind/</a>
and <a href='http://www.nongnu.org/libunwind/'>http://www.nongnu.org/libunwind/</a>
which have code to help more fully understand the spec.
<p>
Its a shame these key libs arent a standard part of the distributions, and
that the kernel itself hasnt yet stumbled on to this, so I may as well try
for them. 
<p>
The problem being solved here is that ustack() is useless on apps compiled
without frame pointers, and many distros do exactly that. 
<p>
Anyway, .eh_frame_hdr is a mini table which maps a program counter to
a block of instructions in eh_frame which describes, amongst other
things, what the stack looks like within a basic block of code.
So, as the cpu pushes/pops things off the stack, it provides a map
of where to find the return address of the function, and that is how
gdb works nicely on x86_64 architectures (and many others).
<p>
Of course, those libraries are significantly complicated since they support
many CPU architectures and scenarios, whereas I am only currently caring
about x86 32 and 64 bit machines.


</div>
</content>


<entry>
<title type="html">Hiiiii! Hoooo! Its off to work we go. DWARF</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-07.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/07/index.html</id>
<published>2009-07-12T19:58:46+0100</published>
<updated>2009-07-12T19:58:46+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Whilst bumbling around in ELF file format, and after a prompt
from Nicolas at Sun, I found out how gdb does its stuff to find
stack frames for an omit-frame-pointer.
<p>
When code is compiled with GCC, it creates a data structure used
for exception handling. I thought this was only used for real
C++ apps, but turns out this is there for non-C++ apps also, and
is hiding in the ELF sections, loaded into memory:
<pre>
$  objdump -h /usr/bin/perl
/usr/bin/perl:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
...
 15 .eh_frame_hdr 00000074  000000000040289c  000000000040289c  0000289c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 16 .eh_frame     0000020c  0000000000402910  0000000000402910  00002910  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
...
</pre>
<p>
So, I need to find these sections in the address space of the running application
to be able to walk the stack. Hopefully this gives us a workable
solution for ustack().
<p>
I have some way to go, not only locating the memory regions for the
current stack to find the ELF blocks, but potential issues if user
space pages are paged out whilst we are walking the procs address space.
<p>
Probably at least a couple of weeks away from getting this working.

</div>
</content>


<entry>
<title type="html">Darnit...i must admit defeat and live my life...</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-07.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/07/index.html</id>
<published>2009-07-11T16:33:34+0100</published>
<updated>2009-07-11T16:33:34+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Mention omit-frame-pointer to people, and if they 'get it', they
will seethe at code compiled this way.
<p>
Thats me after about 2 weeks of trying to improve ustack().
On the Ubuntu releases I am playing with everything is either compiled
without a frame pointer, or GCC has bastardised the stack like a drunk
who has thrown up in the toilet.
<p>
I have tried various heuristics to get something to work, but I need
to dig deeper. (gdb can do it, so I need to see how its doing it).
<p>
Anyway, I got Centos 5 - 2.6.18 installed to fix some issues people
had reported on the 2.6.18 kernel.
<p>
Someone in Sun has contacted me regarding getting dtrace to work on 2.6.18
for the <a href='http://www.lustre.org'>Lustre</a> project. 
I find it elating and funny that Sun have come
to me for dtrace on Linux, since they want it to help debugging.
There were three bugs which the person kindly reported on and he is
in business, so thats a good mutual deed. (Thanks Nicolas)
<p>
I have some other contributions to fix issues with pid/tid, and I am looking
this to see what is wrong in dtrace and fix. (Thanks Mauritz).
<p>
I need to do something in the ustack area - theres a few pent up
fixes/cleanups in my internal code, but I will look at gdb for some
hints and see if I can make some progress.
<p>


</div>
</content>


<entry>
<title type="html">Heat + Programming dont mix</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-07.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/07/index.html</id>
<published>2009-07-03T21:14:36+0100</published>
<updated>2009-07-03T21:14:36+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
We've been having a bit of a heat wave this week in the UK,
and its partially muddled my brain - its beginning to cool off,
so dtrace is looking more attractive.
<p>
I have spent the week playing with the symtab code so that ustack()
can display the user stack traces. I found various issues with the
hacks to get Linux process control to work without radically modifying
the existing code - still more to do, but at least I can concentrate
on the symtab.
<p>
I tripped over a bug in a couple of the ELF functions, where there is
a Solaris v. Linux incompatibility in the error return values.
<p>
I keep finding code where it tries to open /proc/pid/pstatus which doesnt
exist on Linux, and various issues in finding the DYNAMIC/PROCEDURE_LINKAGE_TABLE.
At the moment, its displaying the function names fine, but the module (library)
names are garbage, probably because its expecting to find
the shlib name but I havent stored it anywhere and its pointing to
free memory.
<p>
I just ran valgrind on dtrace and thats helped track a few uninitiatlised
variables, but valgrind doesnt understand the dtrace ioctl()s so any
return from an ioctl() taints the output, unless/until I teach valgrind
how to interpret these.
<p>
I spoke to Adam Leventhal about SDT probes to understand some more
of the internals. An interesting point he mentioned to be was
how SDT works in Solaris: as the kernel boots up, it scans
itself for the SDT probes and readies the breakpoints to be inserted.
So there is a mapping of probes, just like for a USDT application, which
makes perfect sense. 
<p>
I mentioned the trickyness of doing Linux SDT probes in the absence
of source code changes to the kernel, and I know it can be done, but
it may require case-by-case analysis to determine how best
to patch the kernel to get the probe points. When I have finished/improved
user space symbol and process handling then I can go back to that to play,
or, I could just use dtrace to analyse more of the kernel itself.
<p>
More, when theres more to write about.

</div>
</content>


<entry>
<title type="html">rtdb framework</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/06/index.html</id>
<published>2009-06-29T22:44:38+0100</published>
<updated>2009-06-29T22:44:38+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I'm busy at the moment trying to get the rtld/rtdb functions
to work. Its a difficult decision - do I drag in more and more
Sun/Solaris code, so that there is a one-to-one mapping of
functions and intent, or do I stop here, and start
writing my own code.
<p>
The rtdb functions are interfaces to the runtime linker (ld.so.1),
and, although very nice, rely on intimate behavior of the Solaris
linker. This doesnt exist on Linux (i.e. the corresponding functions).
So, copying the code into dtrace means copying more and more dependencies
(avlist, linked list, msg locales and other stuff), for little benefit.
<p>
dtrace uses these functions in a very specific way: get the symtab
of the target process we are tracing, along with the symtab for the
loaded shared libraries.
<p>
I am going to draw a line and see how much I can do without
dragging it in. (I dragged it in and have kicked it out again, as
I just spend more and more time porting Solaris to Linux, which
isnt the end goal).
<p>
The end goal is making the PID provider and user space stack
traces "as they should be".
<p>
This will likely take a while, so will update periodically if
I feel what I have is no worse than before.

</div>
</content>


<entry>
<title type="html">dtrace progress - symtabs</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/06/index.html</id>
<published>2009-06-28T12:10:35+0100</published>
<updated>2009-06-28T12:10:35+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I have put out a new release which is better at handling
stacks for 32+64b platforms and whether they are compiled
with/without frame pointers. Its not perfect - the later your
kernel, the more trustworthy the stack will be, since in the
worst case, we have to examine the stack, word-by-word, to find
likely looking return addresses (the same as the kernel does),
since GCC over-optimises frame pointers.
<p>
I am currently looking at this:
<pre>
$ dtrace -n pidXXX::: -p XXX
</pre>
I tried this on my MacOS system, and was intrigued by the fact
that for a sample Perl app, tens of thousands of new probes sprang into
life. It looks to me that you can DOS attack a kernel with these
privs, since if you do this on lots of processes, you can
eat the probe memory that dtrace will set aside, and either run out,
or affect performance of a system.
<p>
At the moment I am knee deep in more ELF/dynamic stuff, so that
we can get the symtab of a running process so that the PID provider
is more usable.


</div>
</content>


<entry>
<title type="html">SDT probes - what?</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/06/index.html</id>
<published>2009-06-25T23:22:40+0100</published>
<updated>2009-06-25T23:22:40+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
SDT - static probes are high level probes in the kernel, in the
sense that they add value compared to FBT. FBT probes can go on
any function - you know the function got entered or returned.
But finding key datastructures, such as the current "proc" or
"timer" or "packet" isnt easy to discern without playing around
with stack arguments and type casts to a known type.
<p>
Thats how I read the SDT: SDT can provide a probe like
"received_packet" and provide an argument which represents the
packet so you can dissect it.
<p>
But, the question is - are they useful ?!
<p>
I dont really understand the probes despite staring at the code
for a while. I understand lots of the technicalities, but not
the rationale. Is my first paragraph spot on?
Feel free to send me feedback about why they are a *must*.
<p>
Why?
<p>
Well, many of the probes in Solaris relate to Solaris internals.
The concepts of scheduling on solaris dont match the Linux kernel.
Solaris has a process and a lwp (lightweight kernel thread). In Linux,
all threads are really processes.
<p>
So, if you have a D script written for Solaris, it wont work on Linux,
unless I provide as close an emulation as possible. I have found
the FBT is more than enough to keep me entertained, but I am
trying to find if we need SDT.
<p>
There are a lot of values exposed in /proc such as statistic counters.
And there is a lot of code in the kernel which increments those counters.
But the counters on their own are not directly interesting (you can put
an FBT on the functions that manipulate those counters). So, maybe I am
missing something, like, with dtrace/linux today, you cannot easily
inspect processes, io, vm, packets, etc.


</div>
</content>


<entry>
<title type="html">fixed the 32b problems?</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/06/index.html</id>
<published>2009-06-23T23:57:10+0100</published>
<updated>2009-06-23T23:57:10+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Just uploaded a new release -- which may fix the problem.
Found that if I disable the GPF interrupt hook, the
reliability problems disappear. I dont understand how/why - the
race conditions that could happen should be very small...
but seems to work.
<p>
I will have to analyse this more to see why that hook (which shouldnt
fire, and we do put it back on a rmmod) causes a problem.

</div>
</content>


<entry>
<title type="html">32b drat</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/06/index.html</id>
<published>2009-06-23T22:42:51+0100</published>
<updated>2009-06-23T22:42:51+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I have had a bug report that builds since 20090617 for 32b
kernels are failing to load. Strange, because it worked for me,
but I dont have every permutation of kernel and modules.
<p>
After trying a few experiments, it appears that reloading the
dtrace driver will panic/crash/reboot the 32b kernel. (After 3
times for my test machine, and in vmware, a reboot occurs, indicating
a likely triple-fault).
<p>
I suspect maybe on driver unload, something is not being undone
which happened on a load (maybe reset/unhooking the interrupt vectors).
<p>
I am investigating.
<p>
<b>SDT Progress</b>
<p>
Ive done some research on how to get SDT into the kernel without
touching the kernel source. I was hoping for key
subsystems like the scheduler, VM, NFS, that we would find a structure
containing counters which are incremented at key parts of the driver, and
the ones exposed in /proc. If we did, we could modify the instruction
provider to look for these increments, and auto-create the probes.
<p>
What I have found so far in looking around, is that some/all
drivers have either a disconnected adhoc collection of counters or
have per "instance" counters. (I found references to zones in the
MM code), so it wont be as easy as I hoped, but I am continuing
to look for a pattern.

</div>
</content>


<entry>
<title type="html">dtrace -p now works</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/06/index.html</id>
<published>2009-06-22T21:39:44+0100</published>
<updated>2009-06-22T21:39:44+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
We can now attach to a running process and run dtrace on it. I
hit the same kernel bug - namely, that if a process attaches to a
debuggee, and the process creates a thread, the thread cannot "see"
the child debuggee via ptrace(). Nuisance, but now I understand it, its
totally fine - we just attach/detach in the parent and reattach in the child
thread.
<p>
It still concerns me that you can kill -9 the dtrace and the child
can be left stuck in an indeterminate state. Whilst thinking about
this, I have a possible solution, namely to let the dtrace driver
know what we are doing, and should the dtrace process die, we could
force a SIGCONT (PTRACE_CONT) on the debuggee, so all is not lost, and
we dont need to do what Solaris does in the /proc filesystem.
<p>
So, next up is either ustack() (and user space symbol tables), or
the SDT driver. I am still a little confused by SDT and the "transform"
keyword in a D script which provides struct-level access to kernel
and user space params, but I know what I am expecting to see/work, so
I just need to play.
<p>
SDT will be interesting - I have a plan to use the Instruction Provider
to disassemble the kernel and intercept ADD instructions which apply
to a global memory area corresponding to a struct of interest. I hope
this will work for some/most of the desired areas, and if so, we have
a way to intercept processes which trigger various kernel counters.
<p>
One thing to note with dtrace -c/-p - the way dtrace works is to
get the process going and then to kick off the kernel rules engine.
The kernel doesnt really know whats going in user space - you can elect
to monitor probes for the process or any sibling (like truss -f or strace -f)
by virtue of your predicates on the probes you write. This really is very
powerful, since dtrace can (in theory) do everything strace and truss
can do, but via lower level primitives.
<p>
Dtrace emulating truss is available as some scripts on the internet
show, but some aspects of the way this is done is a little "clunky".
I will experiment at a later date to see if we can more closely
emulate strace/truss so that dtrace can be a one-stop-shop for these
kinds of things.
<p>
New release available today whilst I go off and do some more real work.

</div>
</content>


<entry>
<title type="html">One step closer - dtrace -c works</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/06/index.html</id>
<published>2009-06-21T21:33:25+0100</published>
<updated>2009-06-21T21:33:25+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
dtrace -c should now work. It took a lot of energy to understand
the control flow and map the Solaris primitives to standard
Unix ptrace/wait semantics, but it appears to work. 
<p>
You can now do this:
<pre>
$ dtrace -n 'syscall::mu*:/pid==$target/{printf("%d",pid);}' -c df
dtrace: description 'syscall::mu*:' matched 6 probes
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda1              5874396   5097744    478248  92% /
varrun                  255460        72    255388   1% /var/run
varlock                 255460         0    255460   0% /var/lock
udev                    255460        44    255416   1% /dev
devshm                  255460         0    255460   0% /dev/shm
dtrace: pid 7689 has exited
CPU     ID                    FUNCTION:NAME
  0  87377                     munmap:entry 7689
  0  87378                    munmap:return 7689
  0  87377                     munmap:entry 7689
  0  87378                    munmap:return 7689
  0  87377                     munmap:entry 7689
  0  87378                    munmap:return 7689
</pre>
You may see some debug printf's I have left in there, but next
thing is to tackle the symtab (stack/ustack) stuff, and consider
library probes.
<p>
The -c stuff (and -p, which I havent tested yet) may have some issues.
Theres a horrible sleep(1) in the child after a fork() to let
the parent catch up with the child. I found the Linux kernel
seemed to be broken in some areas (I believe threads which inherit
ptrace() children have problems).
<p>
The sleep can be solved easily with some form of shm mutex or maybe
even a futex, but I havent tried.
<p>
What is worrying is that in Solaris /proc fs, you can signal
a child process to continue on its own if you, the parent, die.
On Linux, this isnt there, so, consider:
<pre>
$ dtrace -n ... -p &lt;pid>
</pre>
If you kill -9 the dtrace process, then the target process may be
left in an indeterminate state. This is true for strace. dtrace and strace
can work hard to intercept SIGINT/SIGHUP/SIGTERM/etc, but cannot
do anything about SIGKILL. I can think of a not-nice to partially
solve this (or maybe we could put something into the kernel to handle
this), but that is a reason why /proc/pid/ctl wins on Solaris.


</div>
</content>


<entry>
<title type="html">Linux user threads - bug ?</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/06/index.html</id>
<published>2009-06-19T20:03:01+0100</published>
<updated>2009-06-19T20:03:01+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I have been working on userland dtrace - where you can launch
an app from dtrace itself so that you can trace just this app (or
attach to an existing app, like strace or truss).
<p>
I found something interesting, which had been confusing me, having
spent so long inside the kernel.
<p>
In Unix, we have the ptrace() system call - which is the basis of all
debuggers. You can attach to a process and do things like set breakpoints
or intercept events of interest, like signals.
<p>
The way the works is in one of two ways: if you are a debugger
(which dtrace, gdb, strace, etc all are), then you fork yourself.
The child notifies the kernel it is happy to be traced (via 
ptrace(PTRACE_ME)), and then forks+exec's the target process.
<p>
The parent debugger attaches to the target pid (it knows the pid, because
we just forked). It does this via ptrace(PTRACE_ATTACH), and from then
on can peek/poke the target process, or continue after an event.
<p>
So, here is the bug. In order to ptrace a process you need to attach to it.
Two debuggers (eg gdb + strace) cannot attach to the same process at the
same time. 
<p>
Now, consider this. You are a process. You create a new thread.
This thread forks() + execs the target. The new thread tries to attach
to the process, but fails, because the master thread is considered the 
'parent' of the child, and the thread you spawned is considered to be
a distinct process - not a thread of the main process.
<p>
The issue here is that in Linux, threads are implemented as if
you had forked a new process, but the thread shares the address space
of the parent. This is not true of a proper multithreaded and POSIX
compliant system. E.g. in Solaris, a thread is really a separate
'slice' of a process, and it shares the process id of its parent.
<p>
Linux tries to pretend threads exist, but this funky emulation seems
to break how ptrace() works.
<p>
This is why I have had a hard time getting userland dtrace to work
properly in this area - as I have been trying to understand what dtrace
is doing and why the target process was stuck in the wrong state.
<p>
Now I understand, hopefully the "-c" and "-p" switches to dtrace
can be made to work, and this will be a significant feature addition
to Linux/dtrace.

</div>
</content>


<entry>
<title type="html">Next up...</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/06/index.html</id>
<published>2009-06-15T23:01:52+0100</published>
<updated>2009-06-15T23:01:52+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
$ dtrace -n syscall:::/pid==$target/{} -c "sleep 100"
</pre>

This is how to trace the syscalls for a specific process we want to
launch - one of the last major features of Linux Dtrace which is missing.
<p>
Interestingly, I seem to be hitting an issue with pthreads vs
fork/waitpid semantics...Time to read more on who gets the signals
on Linux, vs solaris...

</div>
</content>


<entry>
<title type="html">Dependencies</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/06/index.html</id>
<published>2009-06-15T21:59:31+0100</published>
<updated>2009-06-15T21:59:31+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Can people who download dtrace and find it fails to build,
please read the README and figure out what they have missing
from their systems in order to build it.
<p>
I am not going to respond to emails for trivial support
issues.
<p>
Thank you

</div>
</content>


<entry>
<title type="html">dtrace and the CALL instruction .. fixed</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/06/index.html</id>
<published>2009-06-12T20:05:32+0100</published>
<updated>2009-06-12T20:05:32+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
After a lot of code and stack trace staring - the issue is now
fixed for 64b kernels. The issue was around a call instruction.
Any probe which started with a call instruction could crash the kernel.
<p>
Amazingly, I was staring at a solution in the Linux kernel, but
my brain has been hazy the last few days. I had implemented
the Instruction Provider which has been a great help to find
lots of samples of instructions I care about and try and get a feeling
for what is going on.
<p>
The issue I was seeing is that when we take the INT3 and INT1 handler -
for the initial breakpoint trap and then the single step trap,
we would expect the kernel RSP to have moved, because we
had just stepped a CALL instruction. But I wasnt seeing this. The
"regs" structure on the stack at the point of exception for the same.
This didnt make sense.
<p>
I hacked it for one 64b kernel, but the others hated my hack.
(My hack involved looking at the stack dumps and trying to 'find'
the magic values I wanted),
<p>
It worked fine on 32b kernels. Imagine an interrupt from
kernel space taking place. The cpu pushes RFLAGS, RCS, RIP, in that
order onto the existing stack. At this point, our code kicks in
and pushes the full register set on to the stack (giving us a "struct pt_regs"
structure we can point to and manipulate before returning from the interrupt).
<p>
Just above the flags should be the stack where we interrupted.
This *is* true on a 32b cpu but not on a 64b cpu. I *think* the
reason is that on 64b cpus, Linux sets up a TSS task switch
so that on an interrupt, we have a private kernel stack, and this
would hopefully avoid stack overflows if we interrupted a deeply
nested part of the kernel.
<p>
That is why the 'regs' structure is always at the same address, and
what we have in the r_rsp field is a POINTER to the original stack,
not the stack itself!
<p>
A quick experiment and I could run:
<pre>
$ dtrace instr::*call*:
</pre>
to trap every call instruction in the kernel and it worked. In addition
<pre>
$ dtrace fbt:::
</pre>
works flawlessly on all three key 64b kernels I was trying, and I hadnt
even broken the 32b kernel in fixing this.
<p>
Theres still a bogus issue or two to track down. Ctrl-C-ing dtrace
can cause kernel problems - not sure why. If you Ctrl-C the
dtrace binary, it sends an ioctl to the kernel to ask it to pull
apart your probes rather than just exiting. Dont fully understand
why they do that but it maybe for when you launch a binary from
dtrace and it needs to kill or detach.
<p>
So, if this done, I can hopefully return back to user space and get
userspace apps to be traced as well, and then we are done....
<p>
The Instruction Provider driver is hopefully going to be useful
to implement a proper set of probes for the things that avoid
patching kernel source.


</div>
</content>


<entry>
<title type="html">Instruction Provider now works</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/06/index.html</id>
<published>2009-06-08T23:35:12+0100</published>
<updated>2009-06-08T23:35:12+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Heres a short example:
<pre>
$ dtrace -n instr::*-nop:
  0  86851 _spin_lock-nop:0xffffffff8045c3f5
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  0  86851 _spin_lock-nop:0xffffffff8045c3f5
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  0  86851 _spin_lock-nop:0xffffffff8045c3f5
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  0  86742 mutex_trylock-nop:0xffffffff8045b1b7
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  0  86933 lock_kernel-nop:0xffffffff8045c6d6
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  0  86851 _spin_lock-nop:0xffffffff8045c3f5
  0  86933 lock_kernel-nop:0xffffffff8045c6d6
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  0  86851 _spin_lock-nop:0xffffffff8045c3f5
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  0  86851 _spin_lock-nop:0xffffffff8045c3f5
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  0  86851 _spin_lock-nop:0xffffffff8045c3f5
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  0  86742 mutex_trylock-nop:0xffffffff8045b1b7
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  0  86933 lock_kernel-nop:0xffffffff8045c6d6
  0  86870 _spin_lock_irqsave-nop:0xffffffff8045c4b8
  ...
</pre>
And another,
<pre>
$ dtrace -n instr::*-lock:
  0  86883 _spin_trylock-lock:0xffffffff8045c51c
  0  86883 _spin_trylock-lock:0xffffffff8045c51c
  0  86883 _spin_trylock-lock:0xffffffff8045c51c
  0  86883 _spin_trylock-lock:0xffffffff8045c51c
  0  86925 __reacquire_kernel_lock-lock:0xffffffff8045c660
  0  86925 __reacquire_kernel_lock-lock:0xffffffff8045c660
  0  86925 __reacquire_kernel_lock-lock:0xffffffff8045c660
  0  86925 __reacquire_kernel_lock-lock:0xffffffff8045c660
  0  86925 __reacquire_kernel_lock-lock:0xffffffff8045c660
  0  86925 __reacquire_kernel_lock-lock:0xffffffff8045c660
  0  86925 __reacquire_kernel_lock-lock:0xffffffff8045c660
  0  86925 __reacquire_kernel_lock-lock:0xffffffff8045c660
  0  86883 _spin_trylock-lock:0xffffffff8045c51c
  0  86883 _spin_trylock-lock:0xffffffff8045c51c
</pre>


</div>
</content>


<entry>
<title type="html">The Instruction Provider</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/06/index.html</id>
<published>2009-06-08T21:46:56+0100</published>
<updated>2009-06-08T21:46:56+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Today I played with the instruction provider - a dtrace probe provider
for tracing classes of instructions. Typically these are jump or call or sti/cli
instructions. This works like the FBT provider but creates probes based
on opcode values. So, for example you can trace every JNE instruction, or only
JNE inside a specific function.
<p>
Its hopefully useful and innovative, but my prime goal here was to provide
a way to debug targetted opcodes, which are not necessarily in the
first location of a function.
<p>
I ran a trial - I went from 25,000+ probes to 300,000+ probes. I quickly
crashed the kernel (hey, it was a first effort), but hope to debug quickly.
<p>
I will probably make it a load-time option to enable it as it really
can be destructive to the system under test with so many probes firing.
But, if it works, it will also be a good stress of dtrace on linux.
<p>
More later!

</div>
</content>


<entry>
<title type="html">E8 again...</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/06/index.html</id>
<published>2009-06-06T21:03:13+0100</published>
<updated>2009-06-06T21:03:13+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Dtrace is working beautifully apart from a 2.6.9 kernel (64-bit)
I am testing on. One fbt probe uses the E8 instruction (relative call).
<p>
This is what a relative call does:
<pre>
E8 nn nn nn nn  CALLR offset
</pre>
We have a 32-bit relative offset from the next instruction. As
a normal subroutine call, this is what should happen: decrement RSP,
move return address to (RSP).
<p>
Now this is very strange: on the 2.6.9 kernel, we single step the call.
The initial breakpoint hits and we look at the RSP. As we single step
over the call, we expect the RSP to have decremented by 8 (64-bit return
addr).
<p>
And it does.
<p>
But there is a gap between the RIP/CS/EFlags for the trap exception
and the return address of the stepped over instruction. Look at the following
debug output:
<p>
<pre>
INT3 PC:ffffffff80110a48 REGS:0000010008eedea8 CPU:0
BEFORE:
Regs @ 0000010008eedea8..0000010008eedf50 CPU:0
r15:000033526a5f59d7 r14:ffffffff804dc2a0 r13:0000010008165290 r12:ffffffff804dcc00
rbp:0000010008165290 rbx:0000010014f84d40 r11:ffffffff80110b5a r10:0000000000000038
r9:0000000001200011 r8:0000010008eec000 rax:0000010015e803b0 rcx:00000000c0000100
rdx:0000000000000000 rsi:0000010008165290 rdi:0000010015e803b0 orig_rax:0000010015e803b0
rip:ffffffff80110a49 cs:0000000000000010 eflags:0000000000000047
rsp:0000010008eedf58 ss:0000000000000018 00000000006f2840 00000000006f0a00
INT3 ffffffffa0236cff called CPU:0 good finish

int1 PC:ffffffffa025aff0 regs:0000010008eedea8 CPU:0
AFTER:
Regs @ 0000010008eedea8..0000010008eedf50 CPU:0
r15:000033526a5f59d7 r14:ffffffff804dc2a0 r13:0000010008165290 r12:ffffffff804dcc00
rbp:0000010008165290 rbx:0000010014f84d40 r11:ffffffff80110b5a r10:0000000000000038
r9:0000000001200011 r8:0000010008eec000 rax:0000010015e803b0 rcx:00000000c0000100
rdx:0000000000000000 rsi:0000010008165290 rdi:0000010015e803b0 orig_rax:0000010015e803b0
rip:ffffffffa025aff1 cs:0000000000000010 eflags:0000000000000047
rsp:0000010008eedf50 ss:0000000000000018 ffffffffa0236d05 00000000006f2840
</pre>
Here we get an INT3 trap and you can see RSP is set to
0000010008eedf58. The "Regs @" entry in the first case shows
the extent of the 'struct pt_regs'. Note that between the printed rsp
and the end of the regs area is a difference of 8 bytes. This shouldnt
be there.
<p>
After the INT3 breakpoint trap, we single step (int1), and look again
at the Regs@ and RSP field. The regs are at the same location - even although
we just executed a call instruction and pushed the return address on
the stack. In the INT1 register dump, RSP is correctly decremented by 8.
Here we have no gap, but for INT3 we do have a gap.
<p>
I have been reading and re-reading exception handling on the web
and Intels docs and there is no reason for the gap.
<p>
What is puzzling is that it works on the other kernels, but INT3
is pushing two extra words on the stack - more than I expect.
<p>
Another interesting issue is that when I look at the kernels I have
and search for E8 call instructions at the first instruction of a probe,
only this one seems to have one. Later kernels (or GCC's) dont seem
to emit the instruction, so, if I dont understand what is going on, there
is a chance that you will hit one and panic your kernel.
<p>
Strange. I am going to put out a new release (at least this fixes
the compiler issues people have been complaining about, and hope no-one
has an E8 in their kernel).

</div>
</content>


<entry>
<title type="html">0xfa and 0xfb - STI and CLI</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/06/index.html</id>
<published>2009-06-05T22:55:21+0100</published>
<updated>2009-06-05T22:55:21+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Strange. In 64-bit mode, trying to single step these instructions
which enable and disable interrupts doesnt work. I'm sure its me
being a little thick and there are a number of gotchas.
<p>
For instance, CLI, which clears the interrupt enable flag will
ignore interrupts over the following instruction (as will STI, or,
maybe only STI does).
<p>
What was happening if process 1 -- init -- would die, and the kernel
would scream at me.
<p>
I have solved this by pure emulation - no point in single stepping
these instructions, and just handle without a single step - which is 
better from a performance point of view.
<p>
I am running on 3 64-bit vmware kernels. 2.6.27.8 runs beautifully.
2.6.27-7-generic - an Ubuntu kernel - runs flawlessly but strangely
slowly when all probes are enabled. I would expect both to run
at the same speed, so either the first is running fast when it
shouldnt or maybe the latter is flawed. (I think the slowdown may be
due to calls to mcount which is doubling the overhead per function
in the kernel).
<p>
The other is 2.6.9 - AS4 kernel. Just shown that to hang, so I need
to debug that before making a release.
<p>
(32-bit kernels appear to work fine, and the compile issues are resolved).
<p>
I have added a special flag to FBT which is interesting/useful.
<p>
<pre>
$ load.pl -opcode
</pre>
<p>
will prefix each probe name with the first byte of the opcode at the
probe, so that it is easier to diagnose where the flaws are. Single
stepping the breakpoint for a probe works, but many instructions have to be
handled specially, such as jumps, calls and rets. So being able to 
find the offending instruction or scenario is helpful.
<p>
This relates back to a prior blog entry where I talked about how
nice it would be to have an instruction prober where we could probe
by instruction type, rather than function. E.g. imagine probing
by virtue of every LOCK instruction. Or REP or CLI. Get the picture?
<p>
How about JMP/JMPNE/JMPEQ instructions? That could be ideal
for low level kernel profiling -- how many times is a jump taken in
*this* function.
<p>
This is easy to do - just need a variation of the FBT disassembler
which doesnt try to instrument the entry/exit of a function, but the
body.
<p>
I may try and get this in on the release after this one, just to see
what it looks like. Stay tuned.
<p>
Hoping to release this weekend or tonight if I can resolve the
AS4 issue.

</div>
</content>


<entry>
<title type="html">E8 issue - now fixed</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/06/index.html</id>
<published>2009-06-03T23:49:43+0100</published>
<updated>2009-06-03T23:49:43+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I found where on the stack my "return address" was hiding, and being
very silly proving to myself what I had done wrong.
<p>
Now...need to fix the compile time issues and a new release is
forthcoming.
<p>
You can cat /proc/dtrace/trace to get some internal trace debug - I need
to tone that done to avoid hitting performance too much. (Its not bad
as it is, but I can do better).

</div>
</content>


<entry>
<title type="html">E8 nnnnnnnn - CALL Relative</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/06/index.html</id>
<published>2009-06-03T23:29:48+0100</published>
<updated>2009-06-03T23:29:48+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I've been stuck on one instruction all week - the call relative
instruction. One function in the kernel has this in the opening
position of a function entry, and we copy the instruction to a temp
buffer and single step it.
<p>
Its not rocket science, but I have been struggling with a lot
of sillyness on just a few lines of code.
<p>
This instruction has two issues - (1) we need to adjust the return
address since we want to return to the original instruction and not
the copied one, and (2) its a jump relative.
<p>
In my work, I have managed to get one or both of these stupidly wrong.
(One issue looks to be not sign-extending a 32-bit displacement to
a 64-bit address).
<p>
Hopefully get this fixed and can move on
<p>
Some people have raised issues about plain compile errors due to
&lt;string.h> and memcpy. I hope to fix this too - very annoying
that I did something to break what was working fine. (I replaced
calls to bzero with a call to memset, and somehow the #define's
conflict with glibcs string header.
<p>
I noticed a new solaris release has come out (2009/06) and the
most notably change for dtrace is the CPC profiler --
<a href="http://wikis.sun.com/display/DTrace/cpc+Provider">http://wikis.sun.com/display/DTrace/cpc+Provider</a>
<p>
This looks neat and really want to get that ported, but I need to finish
the current workload before taking this on board.
<p>
More in a few days.

</div>
</content>


<entry>
<title type="html">I dont want to release just yet....</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-05.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/05/index.html</id>
<published>2009-05-31T19:29:24+0100</published>
<updated>2009-05-31T19:29:24+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Whilst enabling all valid opcodes for FBT tracing is great - it
does show up issues which I had been able to ignore before in terms
of stability.
<p>
There are certain paths of code which can cause probe traps from within the
probe handler, and I get differing results from my differing kernels.
The best kernels are those that crash on me - allowing me to see
an issue, rather than hiding these latent instabilities.
<p>
"fbt:::" works, but theres more to it than this, as many probes
never fire, or a probe may cause an issue when another probe fires.
(The SYSCALL instruction is a prime case - it causes the kernel to enter
at 'system_call', but at this time, the stack and state of the kernel
is not consistent, and we cannot (yet) probe on that, so I have added
it to the toxic list for now).
<p>
There are many entries on the toxic list which can be removed, but am
working thru the failure scenarios I can see at present.
<p>
I've put in an interrupt handler for interrupt #13 (GPF), since
when we do things wrong in dtrace, its good to get a chance to shut
us up and avoid an infinite cascade of console messages, resulting
in a total panic. (If we get a GPF caused by us, we disable all
probes to try and give me a chance to debug what is going on; in theory
this should never happen, but at this time, can do whilst I iron out
some of the thorny issues).
<p>
Keep watching the ftp site - I will upload when I feel happy.

</div>
</content>


<entry>
<title type="html">dtrace progress </title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-05.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/05/index.html</id>
<published>2009-05-29T21:34:31+0100</published>
<updated>2009-05-29T21:34:31+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
The 32-bit dtrace in the kernel is looking good now - the function
disassembler now allows all instructions in the first slot
of a function, and correctly single steps them. (Not strictly true -
since many instructions dont or wont occur in this slot, not for
normal C code or even assembler, just as JNZ, for instance). But
for my kernel (one of my kernels) the number of probes has jumped
from around 23000+ to 24000+ (which illustrates how rare many
instructions are, such as a CALL or LOOP or REP instruction in the
first slot).
<p>
I need to validate the 64b code - I have added a new file (and
removed the older cpu_32bit.c cpu_64bit.c) to store this emulation.
<p>
Interestingly, in looking at kprobes in the kernel, for hints about
what I was doing wrong, I am seeing that I now do much more -- i.e. 
I believe kprobes cannot handle many functions or instructions
in the kernel, so they can borrow from this code if they like, or
use it to educate themselves whats missing.
<p>
I am happy to share the code and ideas, because this way, dtrace
and its competitors can improve, and people who do not contribute
to the code, get something-for-nothing - better quality tools.
I dont look at dtrace as the competitor to annihilate all others.
I am pleased with the quality and thought process Sun put into the
construction, and the raw dtrace engine has given me almost zero
issues.
<p>
Sun had an easy starting point - tight kernel integration and a
dumb C compiler that doesnt show up the issues that inlined assembler
gives to us Linux people. Maybe they can learn something too.
<p>
Someone asked me the other day of the performance differential for
Linux dtrace vs Sun/Apple. My response then, as now, is that near
zero difference. Just because Linux dtrace is not a part of the
kernel, but an addin module doesnt deter dtrace from doing things.
(Ok, SDT is going to be a challenge but much less so, I hope, than
getting to where we are).
<p>
Also, I was asked about the 'alpha' nature of dtrace. I either
write optimistic blogs ("It works!"), or short cryptic ("No it doesnt!")
entries - depending on how positive I feel.
<p>
When dtrace works for more than a few minutes without crashing a kernel,
thats good. But we dont have a scheme to handle coverage (maybe I will
add that) - so we can tell, of the 25000+ probes, which ones fired
and which didnt. Certainly, 20-30+% of the kernel is executing all
the time, but the rest may depend, e.g. if a CD is in the drive, or if
a TCP packet is dropped, or a user space app core dumps, and so on.
And even if a probe fires, it may be handled incorrectly.
<p>
The truth is, if it stands up to scrutiny from people using it, and
I am not writing about "Oops!" moments, then we are making progress.
<p>
Next on my todo fix list is Ctrl-C to dtrace. When I run:
<pre>
$ dtrace -n fbt:::
</pre>
<p>
it works fine. But Ctrl-C-ing it causes a *long* delay - sometimes 5-10s
before the shell comes back. *Sometimes* a kernel GPF is raised, indicating
that as the probes are being removed, an interrupt or something is creeping
in, firing a probe about to be destroyed, and possibly hanging or causing
disruption. For small numbers of probes, the window of opportunity is
tiny. But for all probes, its big enough to be a real problem, whilst
ensuring you cannot panic a running production system.
<p>
Hopefully this will be easy to fix.

</div>
</content>


<entry>
<title type="html">Nuts - i am wrong</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-05.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/05/index.html</id>
<published>2009-05-27T21:01:52+0100</published>
<updated>2009-05-27T21:01:52+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I wrote in the last two articles that Sun's disassembler is wrong
in not handling prefix instructions properly, but that is rubbish,
on my behalf. It does handle them, and I confused myself because
of the changes to fbt_linux.c I am presently working on.
<p>
Apologies to Sun - and am glad that I can trust their code!
<p>
Now to find why a few F0 instructions arent stepped properly...

</div>
</content>


<entry>
<title type="html">REPZ/REPNZ prefix</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-05.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/05/index.html</id>
<published>2009-05-27T20:34:58+0100</published>
<updated>2009-05-27T20:34:58+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I just wrote about the semantics of the LOCK instruction being a prefix
or not, and I now have proof that REPZ/REPNZ should be treated similarly.
<p>
Heres a small dump from the dmesg output after loading dtrace when it
can process the F0 series of opcodes (F0..FF).
<p>
<pre>
[52358.586721] fbt:F instr ptype_seq_stop:c02c5ae0 size=1 f3 c3 8d b4 26
[52358.589013] fbt:F instr neigh_stat_seq_stop:c02ccb60 size=1 f3 c3 8d b4 26
[52358.601019] fbt:F instr seq_stop:c02e3f50 size=1 f3 c3 8d b4 26
[52358.601484] fbt:F instr seq_stop:c02e41d0 size=1 f3 c3 8d b4 26
[52358.601760] fbt:F instr rt_cpu_seq_stop:c02e49f0 size=1 f3 c3 8d b4 26
[52358.601828] fbt:F instr ipv4_rt_blackhole_update_pmtu:c02e4ae0 size=1 f3 c3 8d b4 26
[52358.615720] fbt:F instr icmp_address:c030e3b0 size=1 f3 c3 8d b4 26
[52358.615743] fbt:F instr icmp_discard:c030e3c0 size=1 f3 c3 8d b4 26
[52358.623705] fbt:F instr xfrm_link_failure:c0322210 size=1 f3 c3 8d b4 26
[52358.639776] fbt:F instr __read_lock_failed:c0347a90 size=1 f0 ff 00 f3 90
[52358.641537] fbt:F instr kprobe_seq_stop:c034a9a0 size=1 f3 c3 8d b4 26
</pre>
Note
the size=1 which shows that Sun's disassembler has mistreated the f3 instruction
(REPNZ). 

</div>
</content>


<entry>
<title type="html">LOCK: Prefix or instruction byte?</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-05.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/05/index.html</id>
<published>2009-05-27T20:29:37+0100</published>
<updated>2009-05-27T20:29:37+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I think I found a problem with Sun's instruction disassembler.
The disassembler is needed to work out how big an instruction is, and
is used by FBT, to work out entry and return instructions.
<p>
For opcode 0xF0 (LOCK prefix), the disassembler says we have an instruction
length of 1 byte, i.e. it treats this as standalone, and not as a prefix.
<p>
If we plant an FBT probe on this instruction (and there are a few
in the kernel), then when we single step - we will step the LOCK all
on its own and not have the following instruction, leading, more than
likely to a kernel crash or bad semantics.
<p>
I am amending the disassembler to detect for this, and treat LOCK
properly as a prefix.
<p>
I found this whilst working thru all x86 instruction bytes, so we
can enable the entire lot, and not special case "known scenarios".
<p>
The other prefixes (such as REP/REPNZ, etc should likely be
treated similarly, but I havent found an example in the first
instruction of a function where these instructions are fetched).


</div>
</content>


<entry>
<title type="html">mcount and gcc -pg</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-05.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/05/index.html</id>
<published>2009-05-23T23:25:03+0100</published>
<updated>2009-05-23T23:25:03+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Found it. Some kernels are compiled with profiling turned on (-pg),
which means even our driver has this enabled. I cant find an easy
way to turn it off without interposing my own gcc wrapper, so the
easiest thing is to define our own 'mcount' subroutine which does nothing..
<p>
This means we wont call into the kernel, and we can now safely do:
<pre>
$ dtrace -n fbt::mcount:
</pre>
and see all the calls to it.
<p>
So, we are safe again - we can probe all functions.
<p>
I am seeing a funny in Ubuntu 8.10/64, whereby if I probe
too many functions, I get a kernel trace in /var/log/messages like below,
where it looks like as we enable all probes, we fire before we
are really ready, and subsequently dont fire any probes at all.
Reloading the driver fixes this, but not sure I understand
how/why this happens fully to diagnose as yet:
<pre>
[74745.164017] Call Trace:
[74745.164017]  [<ffffffff8024e9b4>] warn_on_slowpath+0x64/0x90                 [74745.164017]  [<ffffffffa02fab75>] ? dtrace_int3_handler+0x1f5/0x2f0 [dtracedrv]
[74745.164017]  [<ffffffffa030280e>] ? dtrace_int3+0x47/0x53 [dtracedrv]
[74745.164017]  [<ffffffff8024e950>] ? warn_on_slowpath+0x0/0x90
[74745.164017]  [<ffffffff802785fc>] smp_call_function_mask+0x22c/0x240
[74745.164017]  [<ffffffff80225bd0>] ? do_flush_tlb_all+0x0/0x70
[74745.164017]  [<ffffffff80225bd0>] ? do_flush_tlb_all+0x0/0x70
[74745.164017]  [<ffffffffa030280e>] ? dtrace_int3+0x47/0x53 [dtracedrv]
[74745.164017]  [<ffffffff80225bd0>] ? do_flush_tlb_all+0x0/0x70
[74745.164017]  [<ffffffff80225bd0>] ? do_flush_tlb_all+0x0/0x70
[74745.164017]  [<ffffffff80225bd0>] ? do_flush_tlb_all+0x0/0x70
[74745.164017]  [<ffffffff80225bd0>] ? do_flush_tlb_all+0x0/0x70
[74745.164017]  [<ffffffff80278630>] smp_call_function+0x20/0x30
[74745.164017]  [<ffffffff80254a34>] on_each_cpu+0x24/0x50
[74745.164017]  [<ffffffff80225aec>] flush_tlb_all+0x1c/0x20
[74745.164017]  [<ffffffff802cc37d>] unmap_kernel_range+0x2cd/0x2e0
[74745.164017]  [<ffffffff802cc414>] remove_vm_area+0x84/0xa0
[74745.164017]  [<ffffffff802cc390>] ? remove_vm_area+0x0/0xa0
[74745.164017]  [<ffffffff802cc535>] __vunmap+0x55/0x120
[74745.164017]  [<ffffffff802cc4e0>] ? __vunmap+0x0/0x120
[74745.164017]  [<ffffffff802cc6ea>] vfree+0x2a/0x30
[74745.164017]  [<ffffffffa02f93e0>] kmem_free+0x50/0x70 [dtracedrv]
[74745.164017]  [<ffffffffa02e09fd>] dtrace_ecb_create_enable+0x16d/0x20b0 [dtracedrv]
[74745.164017]  [<ffffffffa02d8fd9>] ? dtrace_match_nul+0x9/0x10 [dtracedrv]
[74745.164017]  [<ffffffffa02d8c52>] ? dtrace_match_probe+0xa2/0x100 [dtracedrv]
[74745.164017]  [<ffffffffa02dc5a5>] dtrace_match+0x1f5/0x2e0 [dtracedrv]
[74745.164017]  [<ffffffffa02e0890>] ? dtrace_ecb_create_enable+0x0/0x20b0 [dtracedrv]
[74745.164017]  [<ffffffff8050298a>] ? error_exit+0x0/0x70
[74745.164017]  [<ffffffff802e2191>] ? kfree+0x21/0x100
[74745.164017]  [<ffffffffa02e4297>] dtrace_probe_enable+0xb7/0x190 [dtracedrv]
[74745.164017]  [<ffffffffa02d8f80>] ? dtrace_match_string+0x0/0x50 [dtracedrv]
[74745.164017]  [<ffffffffa02d8fd0>] ? dtrace_match_nul+0x0/0x10 [dtracedrv]
[74745.164017]  [<ffffffffa02d8fd0>] ? dtrace_match_nul+0x0/0x10 [dtracedrv]
[74745.164017]  [<ffffffffa02d8fd0>] ? dtrace_match_nul+0x0/0x10 [dtracedrv]
[74745.164017]  [<ffffffffa02e440e>] dtrace_enabling_match+0x9e/0x200 [dtracedrv]
[74745.164017]  [<ffffffffa02f6efe>] dtrace_ioctl+0x214e/0x23f0 [dtracedrv]
[74745.164017]  [<ffffffff802bbc49>] ? __mod_zone_page_state+0x9/0x70
[74745.164017]  [<ffffffff802b0c6c>] ? __rmqueue_smallest+0x11c/0x1b0
[74745.164017]  [<ffffffffa017a581>] ? ext3_get_branch+0x21/0x140 [ext3]
[74745.164017]  [<ffffffff802b7710>] ? put_page+0x20/0x110
[74745.164017]  [<ffffffff802b1e23>] ? prep_new_page+0x103/0x180
[74745.164017]  [<ffffffff802b2052>] ? buffered_rmqueue+0x1b2/0x2a0
[74745.164017]  [<ffffffff802b2756>] ? get_page_from_freelist+0x2a6/0x380
[74745.164017]  [<ffffffff802ac173>] ? find_get_page+0x23/0xb0
[74745.164017]  [<ffffffff802ac567>] ? find_lock_page+0x37/0x80
[74745.164017]  [<ffffffff802b786e>] ? mark_page_accessed+0xe/0x70
[74745.164017]  [<ffffffff802adba3>] ? filemap_fault+0x1a3/0x430
[74745.164017]  [<ffffffff80266fdd>] ? __wake_up_bit+0xd/0x40
[74745.164017]  [<ffffffff802ab1da>] ? page_waitqueue+0xa/0x90
[74745.164017]  [<ffffffff802ac402>] ? unlock_page+0x32/0x40
[74745.164017]  [<ffffffff802c2244>] ? __do_fault+0x134/0x440
[74745.164017]  [<ffffffff802bbe0a>] ? __inc_zone_page_state+0x2a/0x30
[74745.164017]  [<ffffffff802c310e>] ? handle_mm_fault+0x1ee/0x470
[74745.164017]  [<ffffffff803a7def>] ? __up_read+0x8f/0xb0
[74745.164017]  [<ffffffff8026b51e>] ? up_read+0xe/0x10
[74745.164017]  [<ffffffff80505302>] ? do_page_fault+0x372/0x750
[74745.164017]  [<ffffffffa02f873d>] dtracedrv_ioctl+0x2d/0x50 [dtracedrv]
[74745.164017]  [<ffffffff802f85d5>] vfs_ioctl+0x85/0xb0
[74745.164017]  [<ffffffff802f8883>] do_vfs_ioctl+0x283/0x2f0
[74745.164017]  [<ffffffff802f8991>] sys_ioctl+0xa1/0xb0
[74745.164017]  [<ffffffff8021285a>] system_call_fastpath+0x16/0x1b
[74745.164017]
</pre>



</div>
</content>


<entry>
<title type="html">working again</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-05.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/05/index.html</id>
<published>2009-05-23T22:28:52+0100</published>
<updated>2009-05-23T22:28:52+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I put out a new release earlier which seems to work across
the various platforms and kernels.
<p>
I had disabled many opcodes, and have been adding them back in.
RIP relative addressing (x86-64) is used in the kernel and have
been getting that to work (again!).
<p>
I've hit a hopefully minor issue with 'mcount' - which is in
the 2.6.27+ kernels (ftrace facility). This starts with a RIP
relative instruction, e.g.
<pre>
mcount:
       cmpq $ftrace_stub, ftrace_graph_return
       jnz ftrace_graph_caller
       cmpq $ftrace_graph_entry_stub, ftrace_graph_entry
       jnz ftrace_graph_caller       
</pre>
but I dont believe its single stepping over that initial CMPQ
which is causing the issue, but, possibly whoever is calling it,
e.g. the interrupt handlers themselves.
Hopefully will get to fix this shortly, as that would open up
the possibility to enable any instruction at the start of a function
in fbt_linux.c.
<p>
I've also fixed a couple of crisp bugs/issues today, and I have one
more before I put out an update for that. 

</div>
</content>


<entry>
<title type="html">some progress</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-05.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/05/index.html</id>
<published>2009-05-23T10:02:48+0100</published>
<updated>2009-05-23T10:02:48+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<p>
I *think* I have just put out a stable release -- works on 32b+64b
kernels. Some silly re-entrancy issues not being handled properly.
So, I need to test it more, but full-fbt tracing seems to be working.
<p>
What does this mean? Well, if this stands the test of running
on my various kernels and my real non-VMware hardware, I need
to start moving along. 
<p>
Next up is maybe to look at the userland tracing or look at kernel
stack trace operations, since thats a mess - due to the fact that
kernels may be compiled with or with framepointers, and if you dont
have frame-pointers, then a stack trace can only ever be  guess of where
you are. (The kernel uses '?' to indicate a stack trace is
not necessarily valid - it walks the stack, word by word, to see
if anything looks like a kernel text address).
<p>
I really need to get some CRiSP fixes done this weekend, along
with a new driver to provide low overhead TCP port to PID enumeration.
I may go onto to describe that in more detail later.

</div>
</content>


<entry>
<title type="html">some progress</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-05.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/05/index.html</id>
<published>2009-05-20T22:25:45+0100</published>
<updated>2009-05-20T22:25:45+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I think my recent instability is being caused by a sillyism
in code which I have not released yet (to do with a nested
interrupt trap).
<p>
I rewrote the trap handlers to clean them up and put in a more
powerful state machine, and its dtracing beautifully on the 64b
kernel (I need to revalidate 32b and more kernels).
<p>
I am moving the debug output in /dev/dtrace to /proc/dtrace:
<pre>
/home/fox/src/dtrace@vmubuntu: ls /proc/dtrace
total 0
0 ./  0 ../  0 debug  0 security  0 stats  0 trace
</pre>
The key one is /proc/dtrace/trace - I am trying to move away
from printk for kernel debugging, and using an internal
printf-to-a-buffer mechanism (like FreeBSD), because debugging
the trap handlers is painful if, by virtue of invoking printk, we
invoke a recursive fault.
<p>
So, /proc/dtrace/trace is log /proc/kmsg, - a private internal
memory buffer to log trace info.
<p>
More in a while when I feel happier.

</div>
</content>


<entry>
<title type="html">reliability issues</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-05.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/05/index.html</id>
<published>2009-05-19T22:20:48+0100</published>
<updated>2009-05-19T22:20:48+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Its strange. For most of the last year the 64-bit kernel has 'just worked',
even where it shouldnt have, and the nice interrupt handling of the
kernel shielded me from issues.
<p>
In fixing the 32-bit kernel issues, and redoing the INT3 handling
via raw interrupt patching, has caused the 64-bit kernel to be unreliable.
(Unreliable means within a few seconds of fbt::: probing, we crash the
kernel).
<p>
I *think* at this point its due to a page fault firing whilst handling
the breakpoint and single step trap.
<p>
I am therefore revamping this core code to allow a nested page
fault, and tidying up the code which had started to become a bit
untidy.
<p>
Hopefully have an answer in the next day or so...

</div>
</content>


<entry>
<title type="html">64-bit issues</title>
<author>
<name>Paul Fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2009-05.html"/>

<id>http://www.crisp.demon.co.uk/blog/archives/2009/05/index.html</id>
<published>2009-05-17T21:09:46+0100</published>
<updated>2009-05-17T21:09:46+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
At present, dtrace cannot trace around the irq_return (IRETQ instruction)
in the kernel. I am attempting to fix this, so for now, fbt::: will
hang or panic the kernel.
<p>
The IRETQ (return from interrupt) can be returning from user mode
or kernel mode, but the interrupt handler in my code doesnt/didnt handle
this.
<p>
More news in a while when I have a fix. (And I need to reverify 32-bit
as well).

</div>
</content>


</feed>
