<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
<?xml-stylesheet type="text/css" href="http://www.crisp.demon.co.uk/blog/feed.css"?>


<title type="html">CRiSP Weblog</title>
<subtitle type="html">technical projects, CRiSP, dtrace and other stuff</subtitle>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog"/>
<link rel="self" type="application/atom+xml" href="http://www.crisp.demon.co.uk/blog/atom.xml"/>
<updated>2013-03-27T22:07:48+0000</updated>
<author>
<name>Paul Fox</name>
<uri>http://www.crisp.demon.co.uk/blog</uri>
</author>
<id>http://www.crisp.demon.co.uk/blog</id>
<generator uri="http://www.crisp.demon.co.uk/blog" version="1.0">
/home/fox/bin/blog.pl
</generator>

<entry>
<title type="html">DTrace/ARM</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2013-03.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2013/03/index.html</id>
<published>2013-03-27T22:07:34+0000</published>
<updated>2013-03-19T21:34:37+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Progress on the ARM port of DTrace continues. At the moment,
the driver loads, and reloads into the kernel. (I was crashing on a
reload, which is annoying - having to keep rebooting as I fine tune
the code). Its still a skeletal ARM port, but I can now do:
<p>
<pre>
$ dtrace -n tick-1s
</pre>
<p>
and the expected thing happens. Thats a huge relief - the core of DTrace
just works.
<p>
Next was some small fixes for /proc/dtrace/fbt - to see the state
of the available FBT probes (and specifically the instructions
and patch values). A key difference of ARM vs x86 is that ARM is
a 32-bit cpu - all instructions are 32 bit wide, and not variable
like on x86 (which allows 1-byte or more instructions). Because
instructions are 4-bytes wide, some changes to the key data structure
(intr_t) is needed to handle the 4-byte wide value and apply that
to, e.g. the FBT driver.
<p>
ARM presents many challenges: the first is that there are so
many variants. I am targetting the ARMv6 architecture available on
the RaspberryPi, and uses a hand built kernel running in a qemu VM.
<p>
At present I dont have a /proc/kcore, which is a nuisance - its useful
using this to examine inside the kernel, e.g. disassembly or proving
if a probe is what I expect it to be. (I can work around that easily).
<p>
Additionally, this is a single CPU kernel - no SMP. (I believe
I found a bug in the Solaris DTrace where the maximum number of CPUs vs
the kernel, is concerned; Solaris supports 32 i386 CPUs or 256 amd64
CPUs - if the max CPUs are configured, theres some out-by-one maths going
on in the buffer snap code, which causes dtrace to abort early in its
work. On ARM I hit this because NCPU==1, and the code tries to look
at cpu#0 and cpu#1).
<p>
Now, I am ready to start handling FBT. I can get DTrace to plant probes
on entry, but the return probes are broken, because its using the x86
instruction disassembler and not an ARM specific one (easy to fix, when
I am ready).
<p>
But before I allow an FBT probe, I need to intercept the ARM single
step and breakpoint interrupts. I found this page as a useful hint/starter
to get me to the right place to locate what I am after...
<p>
<a href="http://pankaj-techstuff.blogspot.co.uk/2007/11/story-interrupt-handling-in-linux-2611.html">http://pankaj-techstuff.blogspot.co.uk/2007/11/story-interrupt-handling-in-linux-2611.html</a>
<p>
First step is to put a "no-op" interrupt handler in place to prove
I am doing the right thing. I am not an ARM assembler expert, and am
using gcc -S and ARM instruction references on the net, to get me
closer to one.
<p>
Once this is done, then the syscall mess (in my code) can be looked at -
the systrace.c code is full of hacks and quirks for i386/amd64 and
most/none of this is relevant for ARM. 
<p>
After that, we would be mostly done. (USDT will require some work,
but USDT is more leasurely than core driver work).
<p>
The dtrace, available at my site or github doesnt have all the ARM
updates, so hold off for a while before expecting this to work. I will
update the blog when I feel its more ready for primetime.
<p>

</div>
</content>


<entry>
<title type="html">Divide by 10</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2013-03.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2013/03/index.html</id>
<published>2013-03-03T19:59:08+0000</published>
<updated>2013-03-03T12:18:58+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I've gotten my RaspberryPi VM to run from a kernel I have built,
which means I can now load dtrace into the ARM kernel and continue
debugging.
<p>
One silly issue I have hit is module/divide arithmetic for 64-bit
numbers. ARM, being a classic RISC chip, doesnt have a divide instruction
but the kernel nicely hides this for you. In most cases, divide-by-a-constant
is handled by the compiler via various optimisations.
<p>
At the moment, theres a few pieces of code where divide/modulo are
not by a constant the compiler can see, which results in the compiler
generating calls to maths helper functions. On the i386/x64 architectures,
this is handled by dtrace mapping to the appropriate mechanisms to
call the do_div() function.
<p>
On the ARM, its different - and confusing enough (given the differing
ARM architectures) to not be working yet; so when dtrace is tracing
to its own internal log buffer (which I need to debug the GPF's I am triggering)
its a nuisance, because decimal numbers come out wrong.
<p>
Its interesting poking around the kernel code at the do_div64() macros
and the printf type code, as the kernel makes good attempts to leverage
native CPU features, or emulate, in the case of the ARM/Risc instructions,
to get a reasonable performance.
<p>
Theres lots of ways to do divide, without actually doing a divide
(many algorithms in the kernel). What is a nuisance is debugging the
mechanism I am using to call these, as its mostly crashing the kernel,
and difficult to do in user space (in user space, it can work, because
the compiler "just works", which doesnt help prove what I am doing
is correct in the kernel).
<p>
Still a way to go, but at least I can load the driver.

</div>
</content>


<entry>
<title type="html">Things I hate....</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2013-02.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2013/02/index.html</id>
<published>2013-03-01T21:49:17+0000</published>
<updated>2013-02-23T16:37:24+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Waited for the new JellyBean Galaxy Note I download. Finally available.
<p>
Lets download it. 538MB later....
<p>
<pre>
FAILED: I am not going to tell you why, but sorry. I am really stupid.
</pre>
<p>
Hm. Ok, may its me that is stupid. Lets try again. 538MB later...
<p>
<pre>
FAILED: I am not going to tell you why, but sorry. I am really stupid.
</pre>
<p>
This time, *you* are really stupid. Why did you download a whopper of an
upgrade and discard it, only to download again? Why not do a validation
*before* downloading 538MB file? 
<p>
Why not actually tell me what is wrong? Is that too much to ask?
<p>
So, off to the ROM/rooting sites, and see if I can get it another way.
<p>
(My Note is rooted, but why cant the Samsung firmware updater tell
you what it didnt like?)
<p>
Oh well.

</div>
</content>


<entry>
<title type="html">Why is the C preprocessor so bad? I mean...really bad</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2013-02.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2013/02/index.html</id>
<published>2013-03-01T21:49:17+0000</published>
<updated>2013-02-18T22:27:19+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
The C programming language is 40 odd years old. In the last
two decades, the number of new programming languages has raced
ahead - each with their own look and feel.
<p>
C is a low level language - as CPU speed increases have stagnated,
people have looked at C for more gains in performance. C++ to a large
extent improves on C, and negates much of the need for the preprocessor.
<p>
But one has to laugh at how much attention is given to compilers,
optimisations, libraries and standards. The preprocessor is treated like
a diseased lovechild.
<p>
In all these decades, the preprocessor has barely gotten any feature
enhancements - except for varargs and the # and ## operators.
<p>
What is it missing? Well, for one, *intelligence*. Anything. Please.
<p>
<pre>
#include "filename"
</pre>
<p>
Such a great function, but so badly designed. What is the one
thing every serious application does? Uses some form of autoconf,
so you can do:
<p>
<pre>
#ifdef SOMETHING
#  include "filename"
#endif
</pre>
<p>
Why cant that be in the language?
<p>
<pre>
#include_only_if_exists "filename"
</pre>
<p>
There. That wasnt hard, was it?
<p>
Ok, so lets try one more. Assume a header file defines a value, lets
call it REG_R0. Another header file defines an enum for REG_R0. Consider
the following:
<p>
<pre>
#define REG_R0 0
enum { REG_R0 = 0 };
</pre>
<p>
Of course thats a syntax error above. We can do:
<p>
<pre>
#define REG_R0 0
#if !defined(REG_R0)
enum { REG_R0 = 0 };
#endif
</pre>
<p>
If that enum appears in a header file, we have no way to put the #if statement
in there. We might do something like:
<p>
<pre>
#define REG_R0 0
#include_but_hide REG_R0, REG_R1, ... "filename"
#endif
</pre>
<p>
This might mean, pretend the specified values are not defined for
the duration of including filename, so we can hide and avoid the
syntax error that will result.
<p>
Why do I care? Well, DTrace is fighting issues with ARM register names
with the "ucontext.h" file which seemingly wants to define register names
via an enum. I havent worked out how to resolve this, portably
and at the point of issue (rather than modify every source file to 
change the way #includes are done).
<p>
Really, every language implements string and file parser as the
day-1 feature, and then adds bells and whistles for the rest of its
life to handle this.
<p>
C, being a "portable assembler" chooses to ignore the compilation
environment, and expects every developer to build inconsistent tools
because of the shortcomings of the preprocessor and environment
detection.
<p>
Why cant we do something like in Perl?
<p>
<pre>
#if $ENV{COMPILER_FLAGS} =~ /-O3/
...
#endif
</pre>
<p>
Oh well.

</div>
</content>


<entry>
<title type="html">DTrace/ARM</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2013-02.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2013/02/index.html</id>
<published>2013-03-01T21:49:17+0000</published>
<updated>2013-02-16T23:18:19+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I've put out a new release of DTrace - this one compiles on my
ARM virtual machine (armel architecture - seems to be ARMv5).
<p>
This is a major achievement, although the code is ugly and broken.
Many areas where we have __i386 or __amd64 are changed to handle
__arm__ specific code or are simply left blank.
<p>
The interrupts are not plumbed in; SMP isnt handled (my VM is a single-cpu
VM).
<p>
Theres also issues with the pieces of code which are heavy in x86
assembler (eg syscalls.c).
<p>
Dont expect to use this for anything useful, other than a starting point.
Having got a complete driver, next is to debug piece by piece these
blank holding areas.
<p>

</div>
</content>


<entry>
<title type="html">DTrace/ARM</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2013-02.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2013/02/index.html</id>
<published>2013-02-16T13:21:50+0000</published>
<updated>2013-02-10T17:54:40+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
One of my little projects was to get DTrace ported to ARM. People
using Android, RaspberryPi or other ARM related tools could get
the benefit of DTrace.
<p>
Now this is not a simple project - but I have started some of the
work to prove this.
<p>
I had originally intended to use the RaspberryPi to do a lot of the
heavy lifting, but was not happy with the RPI as a reliable 
hardware device and have issues with some of the kernels.
<p>
Last week, I took a look at qemu/arm - and can actually run an
ARM based virtual machine on my x86 Linux machine. Performance isnt
brilliant, but its palatable.
<p>
After spending ages getting the network to work - so I could get
the requisite packages, I have been updating the scripts and
headers/source code to fill in the gaps for ARM. At this point, we
are about half way through the user land compilation. Some of the
things are annoyances due to the Linux distro (debian 2.6.32 kernel).
E.g. the handling of ucontext.h seems to be different.
<p>
Once I have user land compiling, I can move onto the kernel - obviously
there is a fair amount of 386/x64 code to write for ARM, but am hoping
that its viable - especially given that the i386/x64 code is on top
of the original SPARC architecture - i.e. we know roughly the bits
needing attention.
<p>
If I am lucky, at the end of this exercise, it will work on my
ARM based VM; it almost certainly wont work on other ARM CPUs or
other kernels (ie Android), but at least the heavy lifting will have been
done, and it makes it more palatable to target the other platforms
(or people may do this for themselves).
<p>
I'll put out a new release, which includes some Linux 3.7/3.8 fixes
(thanks to those that contributed the fixes or highlighted some of the
broken build things in my code).

</div>
</content>


<entry>
<title type="html">Writing Software. Forget about it.</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2013-01.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2013/01/index.html</id>
<published>2013-02-16T13:21:50+0000</published>
<updated>2013-01-25T21:53:27+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I havent updated the blog in a while, and I do get a trickle of
dtrace issues/questions, along with CRiSP support ones.
<p>
Firstly, to remind anyone, if I can, to avoid the crisp.demon.co.uk
email addresses and use CrispEditor@gmail.com. I have yet to get
my systems setup properly to send via demon, after they moved email
to IMAP.
<p>
Although things are quiet, its mainly because I have been slowly
creating a ribbon bar for CRiSP. Piecing together how a ribbon bar
works opened my eyes to a number of things. I tend to spend a couple
of months on one or other of CRiSP vs DTrace.
<p>
Sometimes, you get so close to the software you are writing, things
either slow to a crawl - you lose "instinct" - and other times, you
need to go away for a bit (could be an afternoon, a day, week, or month),
and forget the lines of code. You come back to the code as an "outsider"
and immediately see the deficiencies which you could not see before.
<p>
With DTrace, the biggest problem is addressing and ensuring all kernels
build - and this is done piecemeal, not by some divine CI (continuous
integration system). There are hundreds if not thousands of linux
kernels out there (2.6.1, 2.6.2, 2.6.3, ...) and the number of permutations
of compile options means its infeasible to test all variants.
(Some people have built projects to try all permutations of compiler
flags - just for one release, and even that is a huge computation task).
<p>
Anyway, back to the ribbon. I'm fairly proud of CRiSPs GUI controls -
written from scratch, a long time back, and the use of "constraints" to
allow widgets to stick together. This mentality dates back to the early
origins of X windows and allows apps to resize whilst looking reasonable.
Most Windows apps, on the other hand, use pixel coordinates - this is
great for GUI Designer tools, but is why most dialogs are not resizable.
Even Windows' Open File dialog is "slightly" resizable - all the spare
real-estate is given to the filenames (as it should) but it has
caused source code portability issues in each Visual Studio release.
It wasnt a great source code design.
<p>
As a side note, most mobile apps work in a similar vein - as Apple
and Google released new devices which had various screen sizes and
pixel densities, there has been a rush of app upgrades to cater for
these screens.
<p>
Another side note: Apple uses floating point pixel co-ordinates and not
integer ones. When Apple (and Postscript before them) did this, at
the start, they were simply insane - huge amounts of cpu work just to
draw things - back in the days when floating point units were very
slow (or nonexistant). However, fractional coordinates are brilliant
in todays every increasing screen sizes. (I do find it strange that
even Apple has had problems on the iPhone, allowing non-multiples of
the base 320x480 screen size).
<p>
Anyway, back to the ribbon bar. The ribbon is technically very interesting.
If you look at a GUI control like a "listbox" or "tree control" - things
we are all very familiar with, the layout options are easy to 
comprehend, and to most programmers, can figure out how to build
one. (Building one to scale to millions of items is a challenge).
<p>
The ribbon is complex. Very complex and clever. I started by
creating a control with "panels". Bunches of bitmaps and controls go into
the panels, and give the MS Word-like appearance, quite easily. But,
if you play with MS Word or Outlook and watch what happens as you shrink
or expand the window, its unobvious what is going on. If you do it enough,
you can see whats happening, and it feels "natural".
<p>
Now, consider how to implement this - with each icon and column
of the ribbon bar, as you grow/shrink, you effectively have to try
all permutations of "restacking" to get the desired effect. I started
doing this, but did a sanity check. What happens if you have 50 controls
inside the ribbon and want to compute the optimum layout as the window
shrinks, and they wont all fit? By my calculation, this involves
around 3^50 permutation checks. I was astounded. 3^50 is not acceptable.
Maybe 1,000,000 attempts are. What we have is akin to a travelling salesmen
problem (where we try to compute the optimal route in order to visit
a series of destinations).
<p>
I had to read the Microsoft Ribbon control docs to finally "get it".
A ribbon is a collection of panels. Each panel is a collection of groups. Each group
is a collection of controls. Rather than deal with, say 50 individual
controls, you might be dealing with 10 groups. Now, each group can be
resized and laid out, almost independently (not quite). 3^10 is 59049,
which is much more reasonable compute power to use.
<p>
Where does the "3" come from? Each control on a ribbon bar has 3
visible modes (big bitmap, small bitmap + label, or small bitmap).
<p>
Do people like or want a ribbon bar? I dont know. Those of us who use
Word or Outlook, get used to it. Despite losing the drop down menus,
and taking ages to figure out where everything sits, it looks "modern"
and is "enjoyable" to play with. (Of course the ribbon is now quite
a few years old, and Microsoft is going for the "tiled" effect).
<p>
So, that gives you a little perspective of "ribbon controls". 
<p>
Interestingly, when the ribbon first appeared in MS Office, it wasnt
customisable. Now, it is. They must have gone through a lot of effort
to implement it, and then reimplement it. If you look at the ribbon
in the many MS products, they are all subtly different, and differently
"quirky".
<p>

</div>
</content>


<entry>
<title type="html">Where did you come from? Part 2</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2013-01.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2013/01/index.html</id>
<published>2013-02-16T13:21:50+0000</published>
<updated>2013-01-12T22:16:48+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I spent nearly a week trying to figure out where the X11 message had
come from. I knew where, but not why it was different from the other
parts of the CRiSP code.
<p>
Of course, 30mins after posting that, I resolved the problem.
<p>
Definitely a case of being short sighted and not remembering how
my own code worked. 
<p>
When a tooltip pops up, the keyboard is grabbed, and we track the
mouse - looking to see it move outside of the original area, so we
can pop down the icon.
<p>
I assumed some magic was being used to drop the tooltip, but it was
more basic than that. I had forgotten to look for the mouse moving
outside the bounds of the image which gave rise to the tooltip.
(I had nearly spotted the problem a few days back, but got side tracked
as I locked the X server up, and hadnt spotted the "missing code").
<p>
The ribbon most likely wont make it in to the next release of CRiSP,
as its not ready. A ribbon implementation is interesting, because
of the behavior of the control, but also due to the various things that
happen and can be embedded into a ribbon. (Popping up tooltips, complex
tip-like helper windows, complete with embedded images, and a menu
selector).
<p>
CRiSP does most of this now. (I have also been cleaning up the
freetype font display for some of the controls, because getting the
ribbon work was showing the ugly default fonts).
<p>
Now..off to see if this code even builds on Windows, as I have a couple
of customer bugs to examine/fix.

</div>
</content>


<entry>
<title type="html">crisp.demon.co.uk -> CrispEditor-at-gmail.com</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2013-01.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2013/01/index.html</id>
<published>2013-02-16T13:21:50+0000</published>
<updated>2013-01-12T20:55:00+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I wanted to publicise this. It has always been possible to contact
me and get support for CRiSP via the crisp.demon.co.uk email address.
Demon - a UK based internet provider - were literally the first in the
UK, starting around 1991. Using a 9600baud modem, one could finally
ftp to sites around the world - predating HTML and web browsers.
<p>
The price was very cheap - in fact the price hasnt moved in the
last 20 years or so.
<p>
Demon got bought up by other companies, went into the broadband business
and have always been non-competitive in terms of pricing or technology.
<p>
In 2012, they moved from a POP3 mail service to IMAP. The POP3
mechanism and the design of the domains meant I could use as many
mail addresses for any purpose I liked, and collecting email was very
simple since they all mail sat in the same inbox.
<p>
With IMAP, this is no longer true. I cant even remember what
mailnames I have used in the past, and having to try each one to poll
for new mail is pointless.
<p>
Quite  a few years back, I set up CrispEditor-at-gmail.com and many
people use that. The nature of the IMAP service is so bad I may finally
stop paying for the Demon account - the mail will bounce and the 
ftp hosting location will disappear. (I will find some alternate location
or update the links, so people can still find CRISP).
<p>
The Demon IMAP interface is pretty - on a par with hotmail, but
it is so badly thought out, and the migration to IMAP was so badly
publicised or managed, that Demon has fallen out of my good books.

</div>
</content>


<entry>
<title type="html">Where did you come from?</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2013-01.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2013/01/index.html</id>
<published>2013-02-16T13:21:50+0000</published>
<updated>2013-01-12T20:49:10+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I've been spending the Xmas period working on CRiSP. One feature I had
been pondering for a long while was addition of a ribbon bar, like
Microsoft tools. The ribbon can be effect, and aesthetically pleasing
in its design, and quite complex in terms of implementation semantics.
<p>
I have a prototype working - still more work to do.
<p>
I spend most of my live development on Linux - its easier and more
convenient, using the tools. (Windows 7 build speeds are pretty impressive,
and I use MINGW when I need to debug, using gdb, rather than Visual Studio).
<p>
Over the years, one thing has always annoyed me: programming paradigms
where messages are received to effect an action or event, but where
you have no idea what or why they are delivered, or where from.
<p>
The Xlib protocol is simple and of high performance, but due to the
multiprocess nature of an X-Windows system, determining what is going on
can be problematic. CRiSP has built into it an X11 message tracing
facility (set the env var XDEBUG=1 if you are curious). Its very
helpful to see what is happening.
<p>
The particular area causing me problems at the moment relates to tooltips.
CRiSP has had for tooltips for an awful long time - floating text
labels next to items in the GUI (tabbed windows, icons, etc). In the
ribbon and image implementation, the popup of the tooltip, is immediately
dismissed - the act of the popup causes a LeaveNotify event to be
delivered, but I dont know where from. (Most likely the window manager,
but finding out "why" is difficult). Bear in mind its the same
subroutines which implement this - this effect doesnt happen elsewhere.
<p>
As with all difficult programming problems, most likely, I am looking
in the wrong place. Hopefully I wont have to waste too much time on
this.
<p>

</div>
</content>


<entry>
<title type="html">Why is Engadget's new web site so awful?</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-12.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/12/index.html</id>
<published>2012-12-09T21:50:03+0000</published>
<updated>2012-12-07T18:33:15+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I enjoy Engadget - great site with latest news. But I dont understand the
latest revamp.
<p>
Gizmodo did this a while back and it too, is awful. 
<p>
The thing wrong with Engadget (viewing this on a 1920x1080 Linux/Firefox
combo), is that the fonts are huge - really huge (I'm guessing the titles
for items is about 100 pixels high). Thats a waste of space.
<p>
Then, the images for each news item are huge - I can see only one
news item (text) and image per screen display. And the images are not
clickable.
<p>
Then theres all the other window dressing - bad combinations of font,
color and contrast. It looks like it was put together by either a very short
sighted person, or a 5 year old.
<p>
The m.engadget.com mobile web site by contrast is great (when it works;
unfortunately on an Opera Mini, it keeps getting stuck at one
of the "ad" web sites, so I have to switch from Opera
to Firefox to read things on it.
<p>
On Gizmodo - the layout is equally awful - no real indication if I
am reading the news from today or the news from 5 years ago, such is the
diversity and lack of depth of the stories.
<p>
By contrast: slashdot - the style of news delivery has
never changed; the recent web optimisations and revamp are "nice" - still
not brilliant (reading threads on a mobile is an exercise in
futility, and I still dont understand why most comments are not
shown by default).
<p>
Why do IT web sites put about 100 bytes of information on a nearly
1MB web page? (Slashdot is good in signal/noise, as is theregister.co.uk).
<p>
I need to find an alternate to engadget/gizmodo, as they too annoying
and waste bandwidth and CPU.
<p>

</div>
</content>


<entry>
<title type="html">iTunes 11</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-12.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/12/index.html</id>
<published>2012-12-09T21:50:03+0000</published>
<updated>2012-12-04T21:45:54+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
And in a change to our advertised program...
<p>
What is it with iTunes 11? Did they not test it? What was wrong with the
old iTunes?
<p>
Organising TV Shows and Movies (ones you have personally recorded,
not downloaded ones)....So to keep track of what I watch, I would have
iTunes refer to the files in my video folder. And delete the files from
disk and remove from iTunes.
<p>
Well on iTunes 11, and I havent worked this out yet...you cannot delete
a Movie (which may/may-not be deleted from the hard drive). It quite
happily puts the entry back again.
<p>
Yesterday, nothing worked and I have to force kill/restart iTunes - maybe
it thought it had a dialog box to display, but I couldnt find it.
<p>
Today, theres no dialog, but it keeps putting them back.
<p>
And if I view the "Films" section - my films are not to be seen (so I
can delete them), although the TV shows episodes/playlists are.
<p>
So - did they actually *test* this?
<p>
I dont believe they did.
<p>

</div>
</content>


<entry>
<title type="html">The Case for a Safe XCall</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-11.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/11/index.html</id>
<published>2012-12-09T21:50:03+0000</published>
<updated>2012-11-27T21:23:54+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I've written about this before, and its time to write again -
dtrace_xcall(). This is the function DTrace uses on multiprocessor
systems to sync the CPUs so that the CPUs can agree, e.g. on
which buffer to use when logging the traced probes.
<p>
dtrace_xcall() is the function in the driver to do this. On Linux,
this maps to smp_call_function() and friends. A CPU cross-call is
an interesting concept - an ability for one CPU to make a function
call on the other CPU. The use case is rarely needed, and if it
is done, whilst breaking the calling protocol, you can lock up one
or more CPUs or crash the system.
<p>
On Linux, the cross-call (or IPI, interprocessor-interrupt), can
be seen by examining /proc/interrupts and looking for a line
like:
<p>
<pre>
$ cat /proc/interrupts
....
CAL:      52770      36972      Function call interrupts
...
</pre>
<p>
My system has had many hours of uptime, yet the calls are rare (the above
is showing the calls for each CPU).
<p>
When dtrace is called on to do a heavy action, like:
<p>
<pre>
$ dtrace -n fbt::::
</pre>
<p>
tens or hundreds of thousands of probes may be collected per second.
DTrace has two internal buffers to log these probes, and the buffers
will fill up quickly, and the calls to the IPI xcall code will happen
a lot.
<p>
I've been pondering how this actually works - both on Linux and Solaris.
<p>
Lets take a thought experiment:
<p>
Imagine a dual cpu system. One processor is sitting inside a lock region,
with interrupts disabled. The other CPU is trying to access the same lock/region.
Now this other CPU is blocked until the first cpu exits the lock.
<p>
Now, imagine this again. This time, the first CPU takes a very
long time to hold the lock. This would block the other CPU indefinitely.
Normally this is rare - the kernel arranges to never hold locks for
long periods of time.
<p>
Now, lets modify this scenario. One cpu is holding on to a lock, interrupts
disabled, and we do an IPI cross-call. The other CPU is holding on to a
different lock and has interrupts disabled. The first cpu cannot interrupt
the other CPU and so we deadlock. In normal scenarios, this mutual exclusion
cannot happen (other than bugs in the kernel or drivers). 
<p>
IPI interrupts are just like normal interrupts - they can be ignored
when interrupts are disabled, and processed when interrupts are reenabled.
<p>
The kernel smp_call_function() call has a contract: it must not be
called with interrupts disabled. Doing so generates a kernel
log/BUG warning, and indicates the kernel could deadlock.
<p>
When we use DTrace, we can place a probe on any function in the kernel,
especially functions which run with interrupts disabled. This means
we break the contract.
<p>
(I note Oracle UEL Linux DTrace simply calls smp_call_function() and
suffers the bug, unless they have fixed it). In my DTrace, I take
steps to avoid calling the Linux smp_call_function() and implement my
own. It *seems* to work.
<p>
Whilst examining Xen, I had great difficulty finding a way
to avoid smp_call_function() so someone invoking fbt probes
with interrupts enabled can cause deadlocks or long live locks. (DTrace
will detect a mutual deadlock and break the lock, but this is horrible
and can panic the kernel in some extreme circumstances).
<p>
DTrace is supposed to be reliable and the above behavior is horrible.
Simple *HORRIBLE*.
<p>
I have a (new) workaround...see below.
<p>
But, why doesnt Solaris have this problem? Well, the Solaris xcall code
is intimate with the kernel interrupt code, and a CPU waiting in a xcall
runs with interrupts enabled. (The whole Solaris/BSD kernel uses interrupt
priorities to allow much of the kernel to run with interrupts enabled
and even when interrupts are disabled, deadlocks cannot occur).
<p>
I wish I understood the above paragraph more - but experiments,
user success stories demonstrate that Solaris has no deadlock issue.
<p>
Ok, so the solution for Linux.
<p>
If you followed the above carefully, you will note the problem
is caused if we try to do a cross-call whilst interrupts are disabled.
And interrupts are disabled either when probing a function in
interrupt handler code, or inside a locked region.
<p>
So, lets disable probes whilst interrupts are disabled. If we did
this, then the kernel should be safe for fbt::*: probes and never
deadlock. Most interesting scenarios are in the non-locked kernel regions.
<p>
But thats bizarre! How could we do this? That defeats one of the
deep probing aspects of DTrace.
<p>
Well to resolve this conflict of interest - we can have DTrace run
in 'safe' mode by default, and when the user wants to remove this safety
barrier, they can do so, by sending a message to the driver.
<p>
And this is what I am going to do for the next release.
<p>

</div>
</content>


<entry>
<title type="html">Xen progress</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-11.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/11/index.html</id>
<published>2012-12-09T21:50:03+0000</published>
<updated>2012-11-18T20:48:23+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I think I have finally fixed the Xen issues. After slowly
wading through the basics of getting the syscall provider, the
fbt provider, and GPF's in kernel space resolved, it appears to work.
<p>
Life has been difficult because the key to the issues resolve around
paravirtualising certain instructions, but also, the way interrupts
are handled. The normal interrupt routine sequence doesnt work for
INT1, INT3 and PageFault interrupts. 
<p>
What Linux does is provide two distinct interrupt routines - one for
normal hardware, and one for Xen. At a low level in the IDT handler,
it decides which one to use. (This is buried in the Xen handler
for write_idt_entry()). 
<p>
Part of the work is to have a similar mechanism - autodetect if we
are on a Xen host, and use the correct interrupt handlers. Fortunately,
the code in intr_x86-64.S is amenable to parameterisation via the
macro assembler, so the code for all the interrupts is one macro,
with some conditional assembly.
<p>
Another problem area is that when the system dies, hard, Xen is very
unforgiving and reports an issue but no easy way to diagnose, in simple
terms what happened. (Before modifying the page fault handler to
use the correct Xen calling sequence, Xen would kill the guest due
to issues in the page table; this appears to be bogus, and not
the true cause of the issue - that the interrupt stack wasnt
correct).
<p>
I now need to merge the Xen changes back to the mainline code, and
check it still compiles/works on the older kernels.

</div>
</content>


<entry>
<title type="html">Xen progress</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-11.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/11/index.html</id>
<published>2012-12-09T21:50:03+0000</published>
<updated>2012-11-10T22:58:54+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I think I have the FBT provider working now on a Xen guest. As usual,
one has to do "everything right", and "nearly-right" is not good enough.
<p>
Unfortunately, knowing what "right" is, is difficult!
<p>
BTW, I want to complain about google.com. Here is a fascinating
in depth article, written by authors of VMware, which gently leads
you through how VMware works and VM monitors in general.
<p>
<a href="http://web.mit.edu/6.033/www/papers/agesen.pdf">web.mit.edu/6.033/www/papers/agesen.pdf</a>
<p>
My complaint is that google wraps all links, so you cant just take
the URL of a page you jumped to, but need to copy the link from
the results page (which is not in a form that is an http embeddable link).
<p>
The article above hinted at a problem I was seeing in getting FBT
to work *reliably*. FBT uses the INT3 and INT1 interrupt traps.
I have had to rework the interrupt handles to be CONFIG_PARAVIRT
compliant (and need to rework again so that the code works on a 
non-CONFIG_PARAVIRT kernel). Anyhow, a dtrace like:
<p>
<pre>
$ dtrace -n fbt::sys_*:
</pre>
<p>
would run for about 30-60,000 syscalls, and then a problem would
arise. When doing FBT, we replace the instruction with a breakpoint
instruction (INT3), single step the replaced instruction, and then
resume execution.
<p>
What appears to happen is that occasionally the single-step trap would
not fire. The copied instruction, which we single stepped, would
continue execution after the copied instruction..resulting in a 
kernel page_fault. Now this is strange, because in the
copy-buffer, we have:
<p>
<pre>
original-instruction, nop
</pre>
<p>
The nop should never be executed because of the single-step mode;
placing a NOP after the valid instruction seems like a good practise,
(rather than random junk) because otherwise the CPU may fetch ahead
and try to decode a junk instruction, even if it is not executed.
<p>
I tried the following:
<p>
<pre>
original-instruction, nop, nop
</pre>
<p>
Two nop's after the instruction, and it ran much better; the first
time, it got to nearly 1,000,000 traces; I then killed it, removed
some of my debug, and ran again. Whilst writing this article, it
got to about 750,000 traces, but the same thing happened. Heres
the dump from the kernel:
<p>
<pre>
[26881.316402] Call Trace:
[26881.316412]  [&lt;ffffffff81664a82>] ? system_call_fastpath+0x16/0x1b
[26881.316417] Code: ff ff ff ff 00 00 00 00 00 10 00 81 ff ff ff ff 
00 10 00 81 ff ff ff ff 55 6e 10 00 03 8e 03 a0 ff ff ff ff 00 00 
00 00 <b>55 90 90 &lt;00></b> 00 ...
</pre>
<p>
Opcode 55 is "PUSH %RBP"; NOP is 0x90. The instruction after the second
0x90, is an instruction which causes a kernel page fault. So, the
cpu went marching on ahead and fell over...despite the trap
flag being set.
<p>
At the moment, this is weird, and it looks like Xen is not honoring
an IRET (or the hypervisor call equivalent) everytime, and ignoring
the single step mode.
<p>
Now off to do more research ... maybe its a known bug in Xen,
or maybe I have honored one of the rules (but the "rules" arent
written down anywhere :-) ).
<p>
<p>
<p>

</div>
</content>


<entry>
<title type="html">Xen on VirtualBox...</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-11.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/11/index.html</id>
<published>2012-12-09T21:50:03+0000</published>
<updated>2012-11-05T22:26:33+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Running Xen->Ubuntu 12.04 has the undesired side effect of stopping
VirtualBox from working, which means side by side debugging is a pain,
as I need to reboot, and swsusp stops working on my main machine.
<p>
So, now I have Ubuntu 12.04 running inside VirtualBox, and inside
that, I am now creating a Xen guest running Ubuntu 12.04 (yes,
thats 12.04 inside 12.04 inside 12.04). Its kind of mind
boggling but hopefully I can try and debug the dtrace issues.
<p>
<p>

</div>
</content>


<entry>
<title type="html">Xen blog ... strangeness</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-11.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/11/index.html</id>
<published>2012-12-09T21:50:03+0000</published>
<updated>2012-11-04T21:09:02+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I am finding that running DTrace in a Xen guest is a painful thing
to debug. I havent managed to get a decent debugger to help
diagnose the issue I am currently investigating, but thought it
worth writing up. This might help myself jog my own memory.
<p>
I have DTrace working with the various key interrupts (INT1, INT3) and
in trying to get the page_fault handler to work, keep breaking the
guest. We want the page_fault handler so that DTrace can intercept
certain locations within itself, where a user D script might
dereference memory incorrectly. Consider:
<p>
<pre>
$ dtrace -n 'syscall::: { printf("%s", stringof(arg0)); }'
</pre>
<p>
when the arg0 to a syscall is not a string pointer, we will get a
warning from DTrace about a bad memory reference. (Technically, the
kernel generates a GPF but we save outselves from paniccing the kernel).
<p>
What is special about the page_fault handler compared to say, INT1
(single step interrupt)? I dont know.
<p>
Looking at the kernel code and google searching is not helpful at all.
Lets ignore Xen and just visit some basics of assembler.
<p>
In assembler, we have subroutines - a CALL instruction jumps to the
target subroutine, and the return address is on the stack. The simplest
subroutine is:
<p>
<pre>
func:
	ret // for an interrupt routine, this is an IRET instruction
</pre>
<p>
An interrupt handler has to be careful to preserve all registers as
it does it stuff. (In user land we have to be careful too, but we have
some registers we can use without having to save them, such as the
incoming arg list).
<p>
So lets modify the above function, and do something as a no-op:
<p>
<pre>
// Example 1
func:
	push %rax
	pop %rax
	ret
</pre>
<p>
This will crash the Xen guest. The following will not:
<p>
<pre>
// Example 2
silly:
	ret
<p>
func:
	call silly
	call silly
	ret
</pre>
<p>
Whats the difference between example 1 and 2? I dont know. If I look
at example 1, I might hazard a guess that we have an invalid stack,
or a non-writable stack. But example 2 seems to work - we write to the
stack to call function silly and return.
<p>
In the actual Linux page fault handler, it does something slightly
weird, along the lines of:
<p>
<pre>
page_fault:
	call *xen_handler // see below
<p>
	sub $0x78,%rsp
	call save_regs
	...
<p>
save_regs:
	cld
	mov    %rdi,0x78(%rsp)
	mov    %rsi,0x70(%rsp)
	mov    %rdx,0x68(%rsp)
	mov    %rcx,0x60(%rsp)
	...
	ret
</pre>
<p>
Its a strange sequence - the initial "sub $0x78,%rsp" decrements the stack
pointer, leaving room on the stack for the registers, and calls a subroutine
to populate the saved area, rather than a sequence of "push/push/push.."
instructions. The kernel is like this with or without Xen, and possibly
this is a good thing to do for various reasons.
<p>
Now "xen_handler" is a very interesting function; firstly, its not a 
function but a pointer to a function. I think its like this because
the same kernel can be a Xen guest or running native, so the target
function is either a no-op or some actual code. Inside a Xen guest, the
eventual function is:
<p>
<pre>
   0xffffffff8100aae0:  mov    0x8(%rsp),%rcx
   0xffffffff8100aae5:  mov    0x10(%rsp),%r11
   0xffffffff8100aaea:  retq   $0x10
</pre>
<p>
That is a very weird function. Examination of the entry_64.S file in the
kernel, shows that registers %RCX and %R11 need to be extracted - the
Xen hypervisor is pushing these registers on the stack in addition to the
normal semantics of a page fault. The "retq $0x10" is returning
from the subroutine, and also *removing* the two extra registers.
<p>
Lets rewrite the code:
<p>
<pre>
page_fault:
	call xen_pop
	sub $0x78,%rsp
	...
<p>
xen_pop:
	mov 0x8(%rsp),%rcx
	mov 0x10(%rsp),%r11
	retq $0x10
</pre>
<p>
By simplification, this becomes:
<p>
<pre>
page_fault:
	pop %rcx
	pop %r11
	sub $0x78,%rsp
	...
</pre>
<p>
But this appears not to work. It looks like the Xen hypervisor knows something
about the code in a page fault handler, and unless the code obeys what
it is expecting, we get a guest reboot.
<p>
Debugging here is very difficult - when things are wrong, the guest
reboots - very few, if any, console messages. Various web references
to debug tools which arent available in the Ubuntu apt cache.
<p>

</div>
</content>


<entry>
<title type="html">DTrace and Xen...continued</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-11.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/11/index.html</id>
<published>2012-12-09T21:50:03+0000</published>
<updated>2012-11-01T23:23:44+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Work on the Xen guest Linux kernel with dtrace. Progress is "middling".
<p>
As I recounted in a prior blog entry, there are a number of steps
to getting this to work, and it mostly works, but the quality is not
what I want in a usable driver...although I may release it sub-par.
<p>
Firstly, the syscall provider works. This took some work to get
the page tables to be writable - using the correct page table APIs,
which in turn map down to the Xen hypervisor calls. A Xen guest
is significantly different from a genuine CPU.
<p>
A Xen hypervisor call is like a system call, using a special
gateway to the hypervisor, and allows the hypervisor and guest
to make RPC like calls. Things like page table modifications, APIC
and priviledged instruction emulation go through this layer.
<p>
This in turn presents a couple of issues. Firstly, the fbt provider
is having difficulty doing "fbt:::" where we trap every function
in the kernel - the paravirt/hypercall functions must not be intercepted
since they are (possibly) needed to take trap calls. In theory
this is workaroundable by either excluding them from being probe
points (which would be a shame), or by detecting the recursion
and auto-disabling them (which would allow some hypercalls to be
monitored).
<p>
The other area of problem is multi-cpus. When we have multiple CPUs,
dtrace invokes the APIC inter-cpu calls to do RPC's to synchronise
the cpus. There is no APIC in a Xen guest, or rather there is a very
fake one. My DTrace code implements IPI calls in parallel to the kernels,
rather than relying on the kernel support, so that we dont
deadlock and so that we can trace the kernels use of these calls.
<p>
With IPI calls in a Xen guest, there is a lot of reliance on function
calls to handle the hypervisor communication. The IPI calls in a standard
kernel are the lowest level of operation of the kernel and CPU, implemented
using the NMI interrupt.
<p>
The standard smp_call_function() family of functions can be used
in the Xen dtrace, but it possibly exposes a race condition (I have
yet to torture test, but it seems to be easily exposed without
torture testing).
<p>
So, its a bit like porting to a totally different CPU architecture,
and I need to understand these pieces a little more.
<p>
Once the above issues are resolved, then I need to validate it
isnt broken on older/pre-Xen kernels.
<p>
But the end result is being able to use on the Cloud (eg
Amazon EC2), so its definitely an interesting project.

</div>
</content>


<entry>
<title type="html">DTrace and Xen: part #2</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-10.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/10/index.html</id>
<published>2012-12-09T21:50:03+0000</published>
<updated>2012-10-28T21:22:22+0000</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I wrote a few days ago about DTrace and Amazon EC2 not playing
well together. As I have dug into the problem, it has become clear
what is wrong.
<p>
Xen uses paravirtualisation - meaning that priviledged instructions need
to go through the correct API, else the VM may crash abruptly. Its
quite distracting that Xen does this - not even the ability to failback
to more normal semantics at a performance cost.
<p>
Over the many iterations of the Linux kernel, these priviledged areas
have been mapped, via #define macros so that if CONFIG_PARAVIRT is
set, then a function call is made, which invokes the hypervisor; if
CONFIG_PARAVIRT is not set, then the direct instructions are executed.
<p>
This is mostly straightforward; eg the IDT instructions (SIDT and LIDT)
dont do the right things in a Xen guest; Xen keeps a copy of the IDT
outside of the address space the kernel can see. So, any changes need
to be channelled through the correct API. In addition, direct memory
tampering with IDT entries is not allowed - you must tell the Xen
hypervisor that you just changed an entry.
<p>
After making these changes to DTrace, it now loads and unloads,
and, with some corrections to the page-table mapping code, syscall
tracing works.
<p>
I have two or three areas to work on to complete this work.
<p>
Firstly, the interrupt handlers (for INT1 and INT3) are broken - I 
havent played by the Xen API rules, so I need to fix that. (I also need
to make sure that the correct paravirt detection is done; a kernel
which is setup as a Xen guest can be run on physical hardware or inside
Xen; the API functions hide this detail).
<p>
Second, multi-cpu operation needs to be corrected. The code in
xcall.c which invokes the NMI based IPI cross-cpu calls is too low level
and doesnt play well with Xen. If I restrict my guest to a single CPU,
things work well; on a multi-cpu, the system locks up because the
cross-cpu calls are not delivered or processed properly.
<p>
Lastly, having made these changes, I need to handle old kernels
which lack the Xen API calls, so we dont break compatibility.
<p>
I now have a Xen guest on my main machine - a nuisance, because 
VirtualBox wont run on a Xen kernel. So I either need to migrate my
VMs to Xen, or migrate to VMWare or KVM.

</div>
</content>


<entry>
<title type="html">DTrace and the Art of Xen</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-10.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/10/index.html</id>
<published>2012-12-09T21:50:03+0000</published>
<updated>2012-10-24T23:39:57+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Someone reported that DTrace was failing - on an Amazon EC2 instance.
By all accounts, this should work - its a Ubuntu 12.04 kernel, after all.
<p>
Isnt short-sighted great?! Of course it works - I test on Ubuntu - all
the many Ubuntu kernels, as well as Fedora. How dare they report this
doesnt work!
<p>
Of course, as you slowly unravel the detective story, you realise
how right they are (facts dont lie) and how my world is shaped
in some imaginary Universe....
<p>
So, the issue is the Xen virtualisation. I know a little..very little..
about Xen - its paravirtualisation; and its in the kernel.
<p>
But what does that *mean* ? You can read Wikipedia and many web articles
and rarely does the whole picture fit together. And this is where
it gets interesting.
<p>
DTrace, runs in kernel space. Inside the Linux kernel is like
running inside MSDOS - you can execute any and every instruction,
and the good thing about every cpu since the 80286, is that the
segmentation and MMU support means that bugs can be trapped when
attempting to access out of bounds areas (GPF, page faults, or core dumps).
<p>
DTrace, in many respects, is simple, and kernel agnostic (it could
be ported to Windows, for instance. A rainy day project maybe).
DTrace needs to understand the interrupt descriptor table, some
aspects about page tables, and occasionally disabling interrupts.
Most of the bulk of Dtrace is implementing the virtual machine for
when traps occur.
<p>
This applies whether you are on real hardware or inside a VM,
such as VMWare or VirtualBox (and, I believe, KVM/QEMU).
<p>
But Xen is different. Xen runs the kernel VM almost as if the VM runs
in user space, and traps the instructions which require priviledge.
Its an illusion. Where VMWare and VirtualBox trap priviledged instructions,
like STI/CLI and SIDT/LIDT, Xen can do this, but provides an escape hatch
through which the VM guest has to communicate, asking the hypervisor
to do things for it. Theres complexity over things like page
table management - in VMWare/VBox, you can modify page table entries
and 'the right thing happens'. In Xen, you cannot.
<p>
All communication with Xen takes place, via a special "portal" - via
the SYSCALL instruction, sitting in a special page. The Linux kernel
wraps the key instructions and operations via an API. On real iron, those
instructions execute directly; in a Xen guest, the functions translate
to the API calls.
<p>
If you attempt to run DTrace (or a guest O/S) without these API wrappers,
the wrong things happen. And thats what happens to DTrace - GPFs where
none are expected.
<p>
I am working through the issues experimenting to do the right thing,
and will issue an update to DTrace for Xen when I have concluded this
avenue of research.
<p>
For anyone who is interested, here is a link which describes in
some detail, aspects of page table management in Xen - which helps 
reinforce that there is a "right way" for Xen. 
<p>
<a href="=http://www-sop.inria.fr/everest/personnel/Andres.Krapf/docs/xen-mm.pdf">xen-mm.pdf</a>
<p>

</div>
</content>


<entry>
<title type="html">DTrace update</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-10.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/10/index.html</id>
<published>2012-12-09T21:50:03+0000</published>
<updated>2012-10-15T23:02:21+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Havent really touched dtrace in a while - bar some minor bug
reports.
<p>
Claudio K. sent me an interesting mail and was questioning why this
didnt work:
<p>
<pre>
# dtrace -n 'syscall:::entry {
	self->start = timestamp; 
	self->file = fds[arg0].fi_pathname;
	} 
syscall:::return/(timestamp - self->start) > 1073741824/
{
	printf("%d ns on %s", timestamp - self->start, self->file); 
	self->start = 0;
}
' -p 26544
</pre>
<p>
<p>
I briefly glanced the command, noted the use of "-p" and assumed this
was the problem. Claudio highlighted the 3rd line 
referring to the fds[] array
and I was scratching my head wondering what was going on here. The
code makes sense, but I was trying to figure out how this actually worked.
<p>
Research showed this is handled by this, in etc/io.d:
<p>
<pre>
inline fileinfo_t fds[int fd] = xlate <fileinfo_t> (
    fd >= 0 && fd < curthread->t_procp->p_user.u_finfo.fi_nfiles ?
    curthread->t_procp->p_user.u_finfo.fi_list[fd].uf_file : NULL);
</pre>
<p>
Now this was commented out in etc/io.d a long time back, by me,
because I was never ready for it. Those structures are Solaris
structures.
<p>
An excellent posting by the author is available here on translators:
<p>
<a href="https://blogs.oracle.com/mws/entry/dtrace_inlines_translators_and_file">https://blogs.oracle.com/mws/entry/dtrace_inlines_translators_and_file</a>
<p>
So far, so good. So I started converting this to Linux structures and
apart from a minor issue in the libdtrace code (to do with ctf access
to the kernel structures), its not far off:
<p>
<pre>
$ dtrace -n syscall::open:'{printf("%p", fds[arg0].fi_offset);}'
...
</pre>
<p>
So, open, arg0 is a string, and we want the filename. The
nice thing about these translators is that they are a recipe for
accessing struct-like members without having to precreate the
return value for all elements of the structure. (I had fallen foul
of this in existing driver code - and can start to discard the horrible
code!)
<p>
But the *filename*. Well, Solaris has access to this. Linux has
access to the filename (but - I need to do some homework, because Linux
has access via a function, and dtrace wont let us do that at the moment).
But on *MacOS*, its interesting because they simply did not bother.
<p>
<pre>
translator fileinfo_t &lt; struct fileglob *F > {
    fi_name = (F == NULL) ? "&lt;none>" :
        F->fg_type == DTYPE_VNODE ?
                ((struct vnode *)F->fg_data)->v_name == NULL ? "&lt;unknown>" :
        F->fg_type == DTYPE_SOCKET ? "&lt;socket>" :
       ...
</pre>
<p>
Example from MacOS:
<p>
<pre>
dtrace -n syscall::open:entry'{printf("%s", fds[arg0].fi_name);}'
<p>
dtrace: description 'syscall::open:entry' matched 1 probe
<p>
CPU     ID                    FUNCTION:NAME
  1  18659                       open:entry &lt;none>
  1  18659                       open:entry &lt;none>
  0  18659                       open:entry &lt;none>
  ...
</pre>
<p>
If you try to access the filename of a file descriptor on MacOS, you
get an "unknown" output (or "socket", etc). You cannot gain access to the
filename - they just havent implemented that facility. Which is a shame,
as it is mightily powerful.
<p>
On Linux, despite a function being available in the kernel to
convert a file to a name, it is a mutex/blocking function, so we
cannot call it directly, and may need a private implementation
without blocking semantics (occasionally, this could lead to output
corruption, but that should be rare for the scenarios we are normally
interested in).
<p>
I'll spend some time seeing if I can get something to work in this
area, and I will have usefully learnt something whilst adding a new
valuable feature to DTrace (or rather, got a facility working as it should
do).
<p>

</div>
</content>


<entry>
<title type="html">Process Groups and fork speed</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-10.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/10/index.html</id>
<published>2012-12-09T21:50:03+0000</published>
<updated>2012-10-11T23:10:45+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Was just trying out an experiment. Am surprised that my i7 laptop CPU
(2.0GHz) can only achieve 200 fork/sec on Ubuntu 12.04. I would expect
it to do much better.
<p>
Why do I care? Well, have been experimenting with process ids and
process groups - a part of Unix for decades, yet rarely understood,
except by those writing shells or other job control types of activities.
<p>
Run the following command:
<p>
<pre>
$ ps -j
  PID  PGID   SID TTY          TIME CMD
  347   345  3179 pts/4    00:00:00 launch.pl
 1374  1371  3179 pts/4    00:00:00 launch.pl
 3179  3179  3179 pts/4    00:00:00 bash
</pre>
<p>
This shows three processes - one is my shell. Note the PGID column.
What is it?
<p>
Well the process group mechanism is the thing which ensures when you
hit Ctrl-C, you kill all the child processes, but not the shell
itself.
<p>
The shell invokes the system call setpgrp() and the child and all
its children sit in a group.
<p>
The wonderful thing about process groups is they provide a means to
allow killing them all, without having to do the equivalent of 
"ps -aef" to find all the procs in the system. (Imagine you want
to kill all the children and grandchildren, even if these children
are fork-bombing you; in a fork-bomb type scenario, by the time
you have done a "ps" to find the PID, it will have already forked
a copy of itself and the PID may no longer be valid).
<p>
The PGID is interesting; normally its set to the PID of the
process group leader (root of the tree of processes). You can change
it when you like, but you can only change it to the PID of yourself.
<p>
If you do this, and then fork, and have the parent pid terminate,
you can end up with a situation (such as the launch.pl procs above)
where the PID != PGID.
<p>
Now the PGID have an important property. Whilst a PGID of value nnn
exists, you cannot fork a new process to have the same PID. Doing
so would mean you are joining an existing process group. (And this
would be a security issue). (I wrote a script to keep forking
til we hit a specific PID, but it never happened, and debugging showed
this scenario - PGID and PIDs exist in the same name space).
<p>
So, you could create 10,000 pids, each with distinct PGIDs, and
steal 20,000 of the pid address space. (Many Linux's limit you to
10,000 pids per user id).
<p>
I stumbled across this whilst trying to prove a theorem about
process killing - and its good, because it means the real problem
I am trying to solve is not amenable to a race condition or attack.
<p>
There is a converse issue: setpgrp() system call *CAN* fail.
If we try to set a PGID then we can *only if* session-id (SID,
3rd column in the ps listing) is the same. If we are sitting in the
same xterm, we can do this; if we are in a different xterm, we can not.
<p>
SID and PGID are confusing ideas, but effectively the SID is acting
as a kind of policeman over the PGID address space. And this stops
a disparate group of processes merging into the same PGID as another.
Although setpgrp() can be used to set a specific PGID, there is no
syscall to set a specific session-id. The setsid() syscall takes no
arguments.
<p>
This potentially leads into trouble, because one could use 10,000
session ids, and then grab 10,000 process-group ids, and sit on 10,000
pids, and the system would (nearly) grind to a halt - Linux actually
allows 33000 unique pids before reusing them. But two userids can collude
to eat all the available pids.
<p>
Another note on setsid() - it will fail if you are a process group
leader (PID == PGID); typically, a child will do the setsid, in which
case the SID is set to the PID of the calling process. (So my prior 
paragraph doesnt hold true - SIDs are a function of a PID;
if the proc which does a setsid() forks+exits, then you can have
a situation where no PID exists with the same value as a SID, e.g.
a launcher process terminates). But in any case, you are
not going to join someone elses process group whilst you have a distinct
SID. This is important - if you are writing forking-daemons, that
setsid() must be called, else you can interfere with the daemon in
some way, if you carry on launching processes from the same xterm session
(technically, the same SID).
<p>

</div>
</content>


<entry>
<title type="html">DTrace update</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-08.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/08/index.html</id>
<published>2012-12-09T21:50:03+0000</published>
<updated>2012-08-25T22:15:51+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Not much to report this week on DTrace - the pid provider seems pretty
stable, and a number of Linux versioning issues were resolved. (I believe
theres one more to fix, reported on Centos).
<p>
Using the pid provider is interesting - theres some fixes needed for some
platforms, because some rtld (runtime linker) symbols cause an error
to be generated - that just needs a little debugging; but its annoying
that of the thousands of symbols being examined, only one is needed to cause
dtrace to give up.
<p>
I also encountered a few 'too many probes in use' kind of error, e.g.
when trying to instrument all functions in all shared libraries - its
easy to blast past the 250,000 max probe limitation. Probably could
do with looking at the fasttrap tracepoint data structure to see if its
possible to shave a few bytes off and increase the limit. (Solaris allows
the limit to be changed at start time, and Linux does too via
the insmod/modprobe "modprobe dtracedrv fasttrap_max=nnnn").
<p>
Its probably time to relook at the static probes and continue
working through the Solaris providers, to insert the missing probes.
<p>
But a lot of people are using dtrace, and I hope, learning about the
kernel or optimising programs which one day, I may be using.

</div>
</content>


<entry>
<title type="html">More pid provider issues</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-08.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/08/index.html</id>
<published>2012-08-24T08:14:35+0100</published>
<updated>2012-08-11T22:35:05+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Was tracing crisp earlier today (its running inside gdb), and noticed
a strange SIGTRAP inside the gdb. Given gdb had no breakpoints, this 
shouldnt happen, and its not clear where the SIGTRAP came from.
PID provider uses SIGTRAP (breakpoint instruction) as part of its implementation
but, again this shouldnt happen.
<p>
I was pondering what might cause this - and was wondering about
signal delivery. The DTrace fasttrap provider has some acknowledgement
of signals, but am not sure I fully follow what happens, and in any
case, the code is aligned to the Solaris kernel.
<p>
Under "normal" circumstances, the PID provider puts INT3 instructions
at the probe points, and intercepts these before anything else
in the kernel sees them. There are two scenarios for
a probe - in-kernel emulation of the CPU instruction (for instructions
which modify the PC), or, trampolining in user space whereby
a copy of the original instruction is executed from a temporary buffer.
<p>
Consider the following simple program:
<pre>
#include &lt;stdio.h>
#include &lt;signal.h>
<p>
volatile unsigned cnt;
int     tick;
<p>
void alarm_handler()
{
        printf("tick: %2d %u\n", tick++, cnt);
}
int main(int argc, char **argv)
{
        while (1) {
                int     old_tick = tick;
                signal(SIGALRM, alarm_handler);
                alarm(1);
                for (cnt = 0; ; cnt++) {
                        if (tick != old_tick)
                                break;
                }
        }
}
</pre>
<p>
It sits, counting for duration of 1s, and then prints the count.
We use the alarm clock signal to do the printing. Not a strictly
ISO-C compliant program, but good enough.
<p>
If I use the pid provider to trace this, all is good until the alarm_handler
returns from the signal. I need to debug what happens here.
<p>
Out of curiosity, I tried this on MacOS, and, surprisingly, the application
terminates erratically. Theres no core dump. gdb tells me the exit code
is 0170 (presumably a signal was delivered). The tail end of the dtrace is
<p>
<pre>
  0  37727                 alarm_handler:37
  0  37728                 alarm_handler:38
  0  37711             alarm_handler:return
  0  37728                 alarm_handler:38
  0  37711             alarm_handler:return
  0  37708                         start:34
  0  37709                         start:36
  0  37756               stub helpers:entry
</pre>
<p>
So, I wander how thoroughly the PID provider is tested and actually
*used*. 

</div>
</content>


<entry>
<title type="html">New release of dtrace</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-08.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/08/index.html</id>
<published>2012-08-24T08:14:35+0100</published>
<updated>2012-08-11T10:13:19+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I wrote in my previous blog about some issues with the
fasttrap provider and some instruction emulation. After testing
more thoroughly, I found my original cure wasnt strong enough - there
were still some instructions being misinterpreted, leading to app
core failure.
<p>
<pre>
#if linux
        /*********************************************
        /*   Handle:
        /*   41 ff 14 c4 callq *(%r12,%rax,8)
        /*   41 ff 24 f4 jmpq *(%r12,%rsi,8)
        /*********************************************
        sz = base == 5 ? (mod == 1 ? 1 : 4) : 0;
#else
        sz = mod == 1 ? 1 : 4;
#endif
</pre>
<p>
In case anyone is interested, the above works properly for SIB indexed
instructions where there is no offset.
<p>
I can now profile crisp and watch it run a few hundred times slower
(if I instrument every instruction in the application). The beauty
of dtrace is that I can cherry pick functions, shared libraries, and
entry/return points of functions, so its possible, for example to put
a trace on a specific function and see if its called or easily
count how many instructions are executed, or even profile
the sequence of instructions (e.g. looking for abnormally long runtimes).
<p>
At the moment, theres a slight slowdown due to some debug printk()s
(which you can see in /proc/dtrace/trace); I've removed most of the ones
I was using to do my debugging, but a handle of TODO's remain (need
to fix up the rw_enter/rw_exit functions).
<p>
This leaves one thing to fix:
<p>
<pre>
$ dtrace -c cmd ... -n ....
</pre>
<p>
If dtrace is launching the application, it currently doesnt work
properly, because we need to stop the child process as soon as it
launches. Solaris/MacOS have the code in to 'walk' the process to the
starting line, but this doesnt work on Linux, and I need to debug/replace
that code.
<p>
<p>

</div>
</content>


<entry>
<title type="html">callq  *(%r12,%rax,8)</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-08.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/08/index.html</id>
<published>2012-08-24T08:14:35+0100</published>
<updated>2012-08-09T23:00:31+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
This is an amd64 assembler instruction. I have just fixed a bug in
the PID provider where handling of this instruction is not
handled properly.
<p>
The Solaris DTrace pid provider (fasttrap) is very cool, but in porting
to Linux, I uncovered some instructions not being processed properly
leading to core dumps of traced apps. (Its a relief to be dealing
with core dumps and not kernel panics or kernel lock ups !)
<p>
I sent a mail to the solaris-dtrace mailing list - I dont know if I did
it right or if it will be accepted, but I thought I would highlight
this issue, since it affects DTrace on Intel (ie Solaris and Apple, and
quite likely FreeBSD).
<p>
I have more instruction mishandlings to investigate now.
<p>
I'll update the dtrace release over the weekend with this
and any other fixes I have in my holding area.
<p>
(Why this instruction? Because its an indirect subroutine call, 
*without* any offset, instruction coding 41 ff 14 c4; DTrace handles
offset-based register indirection, but not non-offset based).

</div>
</content>


<entry>
<title type="html">DTrace update</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-08.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/08/index.html</id>
<published>2012-08-24T08:14:35+0100</published>
<updated>2012-08-06T21:43:26+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I put out a new release today. This addresses a problem with
syscall::rt_sigsuspend: on Centos/RedHat kernels. Its strange
that even doing a blatant:
<p>
<pre>
$ dtrace -n syscall:::
</pre>
<p>
didnt pick up this issue before, and no matter how hard I tried,
I didnt reproduce the error on Ubuntu (all the way back to Ubuntu 8).
<p>
What is interesting about the Centos 5 series of kernels is how
bastardised they are. They are based on the 2.6.18 kernel, but with many
upstream patches. This means using normal kernel version conditional
compilation, wont work, not without taking into account the RedHat
major version numbers, and even then, the prolific numbers of kernels
are problematic to support. Anyway, the issue with rt_sigsuspend related
to something I had forgotten to do with 32b binaries on 64b kernels.
<p>
That *appears* to be resolved; I did have some form of difficult to
narrow down regression which may persist in the RedHat/Centos kernels
(occasional CPU lock ups - cant get the info out of the locked kernel
to determine what it is, and need to get kdump or kgdb to work properly).
<p>
I am back to playing with the PID provider. Tracing every instruction
in the CRiSP executable works .. for a while, and then we jump off to location
0 for some reason. Difficult to track down (despite lots of debug
in /proc/dtrace/trace), since everything looks right, but we just
decided location zero was a good place to go.
<p>
I wish CPUs had some form of trace buffer to see where we had
jumped from. (Please? Pretty please? Maybe it exists).
<p>
The fasttrap instruction emulation is very clever stuff. Theres a 
performance cost for dtrace on Linux since for some kernels, the NX
bit is turned on for stacks (no-execute), which means we have to fudge
the page table entry to ensure the trampoline instruction works ok.
This potentially involves a TLB flush, which is not nice for performance.
(Theres still quite a lot of printk() debug in the fasttrap code,
so the TLB misses dont hurt as much as the extra debugging code).
<p>
Heres an example of the debug code:
<p>
<pre>
$ cat /proc/dtrace/trace
....
1468.820705312 #0 2343-ffff81001789be28: 4c 89 64 24 e0 ff 25 00 00 00 00 56 bd 4e 00 00
1468.820705312 #0 2343-ffff81001789be38: 00 00 00 4c 89 64 24 e0 cd 7f
1468.820705312 #0 2343-COMMON: 00000000004ebd56
1468.820705312 #0 2343-ffff81001789be28: 49 89 f5
1468.823704856 #0 2343-fasttrap_isa: 1710: pc=00000000004ebd59
1468.823704856 #0 2343-ffff81001789be28: 49 89 f5 ff 25 00 00 00 00 59 bd 4e 00 00 00 00
1468.823704856 #0 2343-ffff81001789be38: 00 49 89 f5 cd 7f
1468.823704856 #0 2343-copyout: line 1723 ffff81001789be28 00007fff775df638 c=22 0
1468.823704856 #0 2343-dtrace_linux.c:rw_exit:1925: TODO:please fill me in
</pre>
<p>
Until this is resolved, please take care if doing instruction
level tracing.
<p>
I also have a report of a compile issue with Arch linux, but I have
not been able to take the distro release (for i386) and get it to survive
more than a minute or two in a VM, so I cannot easily debug the issue.
<p>
<p>

</div>
</content>


<entry>
<title type="html">Quote of the day</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-08.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/08/index.html</id>
<published>2012-08-24T08:14:35+0100</published>
<updated>2012-08-02T21:36:00+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I have had a hectic time, eradicating bugs in dtrace. Theres
still some gotchas in the PID provider (certain instruction emulation
issues). As fast as I can fix them, people are raising new issues.
<p>
Supporting older kernels is very labor intensive - I make slow
forwards progress, but each forward step raises issues with the 
legacy kernels. The recent taskq work (which uses Linux workqueues) is
a good case in point.
<p>
The taskq code is very simple - a mechanism for running background
tasks away from a user process - to avoid interrupt deadlock.
But those few lines of code rely on a mechanism which has many radical
changes in the kernel.
<p>
The workqueue API is a mixture of #defines and real functions.
Many of the functions are GPL functions. So, it works on kernel N, but
not kernel N-1. Its quite time consuming ensuring that the 'fix' I
put in for kernel N, doesnt affect kernel I, J, K, L, M, etc.
<p>
Then people report issues on kernels I dont have. Just had a report
of issues on Centos 6.3. In fact in recent days, I have got
Centos 5.2, 5.3, 5.5, 5.6 and 6.3 running in VMs (getting quite adept
at configuring from scratch). The Centos/RedHat kernels have
confusing kernel version numbering because one cannot rely on the
triplet versioning (2.6.32, for instance) to determine which kernel
we are compiling under.
<p>
Anyway, the quote of the day is attributed to Jeffrey:
<p>
<pre>
You are brought this tool to the Common Man and not the guy who has a
huge yacht.
</pre>
<p>
Thank you Jeffrey for that. Despite him having some strange issues on
Centos 6.3, it feels worthwhile just for that, alone.
<p>
Now, off to strip the PID provider bare. If I win, I'll report back.
If not, I got sucked into a VM, and couldnt find the exit.

</div>
</content>


<entry>
<title type="html">Driver unloading .. follow up</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-07.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/07/index.html</id>
<published>2012-08-24T08:14:35+0100</published>
<updated>2012-07-28T22:34:17+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
After researching the issue of a driver which is unloaded
whilst there are still references to it, and trying various
things, I uncovered a bug in many of the drivers in DTrace.
<p>
The following line
<p>
<pre>
  .owner = THIS_MODULE,
</pre>
<p>
is the magic which ensures the ref count on the module is
incremented for all opens, and you cannot unload the module whilst
in use. 
<p>
For various reasons, many drivers didnt have this - probably an issue
with out of date documentation on the older kernels, which predate
this feature.
<p>
This link:
<p>
http://stackoverflow.com/questions/1741415/linux-kernel-modules-when-to-use-try-module-get-module-put
<p>
showed me the error of my ways.
<p>
This should stop me accidentally unloading the driver when some
user space process still has a reference to the driver(s).

</div>
</content>


<entry>
<title type="html">Driver unloading</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-07.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/07/index.html</id>
<published>2012-08-24T08:14:35+0100</published>
<updated>2012-07-28T19:25:30+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Sometimes, assumptions hurt. I know how Linux works: you can load
a driver (insmod/modprobe), and remove the driver (rmmod).
<p>
Pretty simple. "lsmod" can show drivers loaded (usually a lot), and
many are unused.
<p>
It makes sense you cannot unload a device driver if its in use, especially
if the device driver implements a filesystem.
<p>
Sometimes, the physical world is cruel. No amount of clever coding
in the kernel can prevent you physically/forcibly removing a floppy
disk or memory card. And the modern kernel can handle such horrible
brute force actions.
<p>
Lets switch to dtrace. We dont want to unload dtrace whilst its in use.
All the code is there (thank you Sun). The Linux port makes various
checks before allowing the driver to be unloaded.
<p>
[Most people dont care about this, but as a developer, I want to load
and unload the driver frequently, without having to reboot the kernel].
<p>
So, there are two types of "in-use" actions in dtrace: you are running
dtrace waiting for probes, and, you have done a PID/USDT provider
probe.
<p>
In the first case, I can make sure I am not running dtrace when reloading.
<p>
In the second, well, things can go awry. If you use:
<p>
<pre>
$ dtrace -c cmd -n ....
</pre>
<p>
Then theres two parts of dtrace which are active: the dtrace process
itself, and the cmd or process being traced. (Remember, user level
probes will trap to the kernel).
<p>
If we somehow kill -9 the dtrace, then dtrace will leave the probes
in tact until the process exits. If the process hits the probe, then
the probes will be redundant.
<p>
In reality, we can unload dtrace whilst probes are active in a user
process - it will terminate with SIGTRAP when the first probe is hit.
<p>
What I *didnt realise* (because I am definitely stupid), is that the
module unload code in a driver is a "void" function. It cannot prevent
itself being unloaded. Once the kernel wants to unload you, it will
happen. And if you dont clean up properly, your kernel is likely
to have a problem (GPF or panic).
<p>
Dtrace will crash if the device driver is open/in-use, because although
it tries to prevent unloading of the driver, nobody is listening. Duh !
<p>
Ok, so we can probably just let dtrace unload and stop worrying.
<p>
Or we could prevent unloading whilst active probes exist. After some
investigation, the kernel function try_module_get() is the function
to implement the drivers in-use count (as seen, by lsmod). Interestingly,
it is rarely called. It is *not* called simply because you opened the
device, e.g.
<p>
<pre>
$ sleep 100 </dev/dtrace &
</pre>
<p>
Its typically incremented for executables coming from the filesystem. I dont
think its called because a file is open. (Maybe we can panic the system
if we hold on to devices in the system which are unloaded?)
<p>
(It might be possible to modify the module reference count on open + close,
but this is almost certainly impossible to get right; consider what happens
on a fork or dup system call - file descriptors can be cloned, but the
underlying driver will never know that).
<p>
[And, why do I care? Because as I play with the PID provider and add
new probes, I keep crashing the kernel if dtrace is running or hung. Normal
users shouldnt care about reloading dtrace].

</div>
</content>


<entry>
<title type="html">PID provider...update</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-07.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/07/index.html</id>
<published>2012-08-24T08:14:35+0100</published>
<updated>2012-07-27T19:44:59+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Some good news...I wrote the other day that the PID provider now
"works" (for various definitions of 'works'). Having done that,
of course reports came in on the features I had missed.
<p>
I've spent a fair bit of time ironing out portability issues
introduced in the latest releases - affecting older kernels, and
lots of GPL-only issues, which are nearly all worked around.
<p>
The GPL issues are interesting; I try to stay away from the politics.
The GPL and Linux has its own world to look after, and what with
legal implications of license pollution, they need to be careful
of defending themselves. DTrace comes under the CDDL. From my perspective,
its a bunch of source code, and people are welcome to it. But
the CDDL can give rise to closed source derivatives, which is a shame,
but understandable.
<p>
Anyhow, one of the key issues was related to "dtrace -c cmd" not working.
I hadnt tested that.
<p>
I also found that the user land dtrace would hang when trying to
attach to a suspended process (eg a backgrounded app asking for terminal
input).
<p>
I decided I had to move past the ptrace() interface - its too
limiting to allow dtrace to do what it wants.
<p>
A while back I had created the /dev/dtrace_ctl driver - it was designed
to emulate the Solaris /proc interface, but I shied away from doing this,
because it would have been a distraction - attempts to emulate lots
of corner cases from Solaris, made difficult, because the Linux
kernel is not the Solaris kernel. This was a good move. (And someone
suggested I not do this, so it wasnt my idea).
<p>
What I did do was resurrect the code in driver/ctl.c, and make it into
a read/write interface for processes - doing what ptrace() does, but
without the semantics of ptrace. Switch dtrace to use this interface
instead of ptrace() immediately fixed the stopped-process problem and
allows the "dtrace -c cmd" to work.
<p>
I have some bugs to fix and another kernel/GPL issue to resolve, but
hope to release later today or over the weekend.
<p>
I had put off doing the PID provider, because I knew it was going
to be difficult - but I was lucky. The original Solaris code works
a treat, and the only problem was me reading too much into the complex
code.
<p>
More later....

</div>
</content>


<entry>
<title type="html">Hey VirtualBox...what ya' doin' ?!</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-07.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/07/index.html</id>
<published>2012-07-27T14:27:33+0100</published>
<updated>2012-07-26T21:21:57+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
When I push out new releases of dtrace, I cannot apply as high
a level of quality as I would like. I had set myself up a nice
environment to cross-compile from a single system, against lots of 
kernels, but for various reasons, that didnt work.
<p>
I have a nice collection of VMs for lots of versions of Linux - going
back to very old versions. Originally, under VMWare, but over the last few
years, VirtualBox. I like VirtualBox, but I also have problems with it.
<p>
VMware (vmplayer) is nice, but very limited. I did get bored of
vmware not being installable in new kernels, and requiring hacky
patches to the source code to make it work.
<p>
But I am somewhat annoyed that many of my VMs seem to have "broken" or
"gone off". E.g. Centos 4.7, 5.5, 5.6 -- they no longer boot under
VirtualBox. I have tried various things - they used to work, but
no longer do. So, my supply of VMs is limited. (Each VM has my
set of customisations, to make them comfortable to login to). Its
a nuisance having to reset them up, or try and guess why an old
kernel no longer works.
<p>
I may have to go back to vmware or try out kvm, to see which works
best for me.
<p>
This really is a big problem - may not recognised by the industry - but
a VM which stops being usable, due to a host upgrade or VM software
upgrade, really demeans the valuableness of having VMs in the first place.
<p>
It might be that I could downgrade my VirtualBox to restore the older
VMs, but this is turning into a job-creation scheme, rather than
a productivity boost.
<p>
I really dislike VirtualBox's nested-snapshot mechanism - despite its
power, its confusing  -- very confusing and you can end up reverting
a snapshot and losing a lot of data. VMwares snapshot/restore was
much simpler to get along with.
<p>
<p>

</div>
</content>


<entry>
<title type="html">PID Provider: Did you call? #4</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-07.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/07/index.html</id>
<published>2012-07-27T14:27:33+0100</published>
<updated>2012-07-22T15:18:53+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<p>
Creating a kernel thread in Linux is easy. But I immediately
slammed into some issues.
<p>
Much of the "workqueue" API for doing this is GPL protected.
DTrace is a CDDL driver, and attempts to compile or link against
these GPL protected functions caused errors. I found a workaround,
similar to the dynamic symbol lookup already in dtrace. The
implementation of this is slightly ugly due to the functions I wanted being
embedded in #define macros. I didnt want to replicate the macros directly
to modify them, as this makes the code frail and subject to breakage
in future kernels.
<p>
Additionally, the calling sequence of one of the functions has
changed in recent kernels (3.2 .. 3.4). This means I have to be
really careful. I worked around this with a tiny piece of assembler.
<p>
But from what I read, this workqueue API is only relatively recent
addition to the kernels. They appeared in the 2.5 kernels, changed
substantively in 2.6.20. So its possible that the code I have
which compiles for later kernels, will fail abysmally for older kernels.
The community will need to feedback, or we will have to disable PID provider
for older kernels.
<p>
So, we are done! PID provider works.
<p>
Well, I say it works .. it works for a sample app of mine. It needs
a lot more testing, and I daresay reported breakage will be difficult
to debug. The good thing is that I made almost zero changes to the
Solaris code - only fixing some glue code, and making some changes in
libdtrace.
<p>
If you have read all of these blog excerpts, and understood it, good
for you. I learnt a lot debugging this, and I feel more confident in
how dtrace works architecturally, and the code stuff I have done.
<p>
Theres still a long road ahead to torture test the PID
provider.
<p>
And I need to rewrite the libdtrace/ process read/write, to avoid
the ptrace() issue or avoid leaving a process in the stopped state.
<p>
I plan to release the code - once I have done a little cleanup,
later today (20120722).
<p>

</div>
</content>


<entry>
<title type="html">PID Provider: Did you call? #3</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-07.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/07/index.html</id>
<published>2012-07-27T14:27:33+0100</published>
<updated>2012-07-22T15:10:17+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<p>
When I ^C'ed the dtrace process, I would often panic the kernel .. badly.
A slurry of messages scrolled on the screen telling me an atomic
condition was broken.
<p>
Huh?
<p>
The act of tearing down the potentially many probes in a process
is long enough that various windows of vulnerability exist.
If you kill -9 the dtrace process, it will tear down the probes,
and it will do so, with various mutexes set. If fasttrap tries
to dismantle its version of the probes, a deadlock can exist.
<p>
So, fasttrap code, during the teardown, uses an optimistic timer
to take out the probes (tracepoints). The mechanism is a classic
kernel function - timeout(). Up until 2 weeks ago, timeout() in
dtrace4linux was a stub implementation.
<p>
I had to implement timeout() and quickly knocked up some code,
based on the hrtimer mechanisms in the kernel.
<p>
This caused me no end of issues, and took me ages to understand
what was going on.
<p>
As dtrace was closing the /dev/dtrace device, tearing down the
probes it had set, the timer would fire, and interrupt the
closing dtrace. The timeout function would dismantle the fasttrap
probes, and assert a mutex, held by the terminated dtrace process.
Classic deadlock. It also showed up a potential problem
in my code (driver/mutex.c), which attempts to call the scheduler if
a mutex appears stuck (which lead to the kernel issues, since
calling the scheduler from a timeout is not the correct thing to do).
<p>
I checked the Solaris code, to remind myself how timeout() works.
What I found was interesting. A timer interrupt, in Solaris doesnt
just fire, interrupting the current process. Its fired from
a special context, effectively interrupting a dummy process. This
resolves the deadlock - a timer can never interrupt a mutex protected
block of code - it interrupts in the context of another process. So
the original process can make progress, release the lock and allow
the timeout to make progress.
<p>
We are almost done. 
<p>
There is a piece of code in driver/dtrace.c, which I never understood,
and had commented out:
<p>
<pre>
dtrace_taskq = taskq_create("dtrace_taskq", 1, maxclsyspri,
            1, INT_MAX, 0);
</pre>
<p>
It hasnt harmed dtrace4linux having that commented out. The reason
I was looking was that in looking at /proc/dtrace/fasttrap, I could
see what a PID probe looked like. When the target process and dtrace
terminated, these entries were not cleaning up. fasttrap.c does
garbage collection, but it wasnt clear how this happens when locks prevent
progress. Function dtrace_unregister() calls this function to
actually remove one of these fasttrap probes:
<p>
<pre>
(void) taskq_dispatch(dtrace_taskq,
                    (task_func_t *)dtrace_enabling_reap, NULL, TQ_SLEEP);
</pre>
<p>
What does that mean? I didnt know. Searched on google, none the wiser.
But it slowly dawned on me.
<p>
Ever did a ps on Linux, and saw entries like this:
<p>
<pre>
$ ps ax
  PID TTY      STAT   TIME COMMAND
    1 ?        Ss     0:01 /sbin/init
    2 ?        S      0:00 [kthreadd]
    3 ?        S      0:02 [ksoftirqd/0]
    6 ?        S      0:00 [migration/0]
    7 ?        S      0:00 [watchdog/0]
</pre>
<p>
Those processes in square brackets are kernel processes. This is what
taskq_create() is doing - creating a kernel process. This is a tiny
part of dtrace, but a very VERY important part! To avoid timers from
deadlocking with user processes due to mutex contention, we need the 
timers to fire from a process which cannot possibly be running dtrace. 
So, taskq_create() creates a kernel process, and when the kernel
cannot free a probe (because it looks like it is in use), a timer is fired
to retry the cancellation of the probe.
<p>
So, I now needed to implement taskq_create() on Linux. A quick google search
and I found what I wanted - "workqueue"s. This is the mechanism to create
a kernel process and asynchronously handle the callbacks. A quick
piece of coding, and it was looking good.
<p>
Continued in part 4....

</div>
</content>


<entry>
<title type="html">PID Provider: Did you call? #2</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-07.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/07/index.html</id>
<published>2012-07-27T14:27:33+0100</published>
<updated>2012-07-22T14:56:11+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
So, we can place a PID specific probe in an arbitrary process.
Doing so causes modifications to the target address. When the
address is executed, a breakpoint trap is executed.
<p>
Fasttrap (PID provider) looks in its data structures and
maps this to a probe, and logs a record for the originating dtrace
user.
<p>
But we have to get past the breakpoint. What the code in fasttrap
does is similar to the in-kernel code. We single step the instruction
which was patched. This is quite complex, because, remember, the
original instruction has a breakpoint placed on top of it. So, fasttrap
arranges to single step the instruction in a scratch buffer.
<p>
It took me ages to "get this". What scratch buffer? When doing in-kernel
probes, dtrace4linux has a per-cpu scratch buffer for this purpose.
But we cannot use this, for two reasons. Reason 1: its not visible to the
process in user space. Reason 2: processes may be preempted, so we
cannot guarantee that the scratch buffer would remain unscathed to
complete the action for a process, before another process steps on the same
thing.
<p>
I spent a long time looking at this, trying to figure out how
Apple/FreeBSD/Solaris does this. On Solaris, each thread in the system
has a scratch buffer in the in-kernel lwp_t structure. Ah! We cannot
force this on Linux, without rebuilding the kernel, but we just need
a private area to dump the scratch instruction into. I was looking
at the idea of jamming a 4K page into the address space of the process,
or leveraging the VDSO system call page, but both require some thought
because of need to garbage collect, or avoid problems with other
users of the area. In the end I decided that the current thread stack
is a good place to do this. Most threads have 1-10MB of stack, and rarely
use more than a small fraction. In fact, nobody uses the bottom area
of the stack, since doing so, might expose the application to random
segmentation violations as it runs out of stack. So, stack space is
allocated to be much larger than any part of a process needs.
<p>
So, we can just use the area below the stack. This isnt ideal, but
its simple. Its not ideal, because it means a process, which does not
obey the normal stack frame rules, might be perturbed by what we are
doing. Tough. :-)
<p>
Another long problem I had was figuring out what actually happens
during a PID probe. When you use dtrace to plant a probe, the PID provider
constructs dynamically a probe for you. (You can see the probes, e.g.
pidNNN:::). These probes disappear when either the target disappears or
the probing dtrace disappears.
<p>
But how? I spent ages looking at the code. I added debug to /proc/dtrace/fasttrap.
Whilst PID provider is in action, you can see three tables which fasttrap
keeps:
<p>
<ul>
  <li>1. A table of the probes themselves (tracepoints). Used to map a trapping probe
  back to the owner process</li>
  <li>2. A table of processes being probe-provided. When the target
  process terminates, the owning tracepoints are dismantled.</li>
  <li>3. A "provider" table. When you attach to a process to probe it,
  probes are created, but the probes belong to a provider (eg "fbt",
  "syscall", or "pidNNN"). Each process you attach to is effectively a brand
  new provider.</li>
</ul>
<p>
Now I found some other interesting things out. If you probe a process, and
that process dies, your dtrace does not terminate (unless you make provision
for this in your D script). The dtrace hangs, and its up to you to ^C it.
<p>
Heres an example of the trace tables in fasttrap:
<p>
<pre>
cat /proc/dtrace/fasttrap
tpoints=1024 procs=256 provs=256 total=9
# PID VirtAddr
TRCP 5748 000000000040087d
TRCP 5748 000000000040087e
TRCP 5748 0000000000400881
TRCP 5748 0000000000400886
TRCP 5748 000000000040088b
TRCP 5748 0000000000400890
TRCP 5748 0000000000400891
PROV 5748 pid 0 0 9 0 0
PROC 5748 1 1
</pre>
<p>
"5748" is the target PID is was tracing. The TRCP entries show the virtual
address in the target process where probes lay in waiting (each instruction
of the "do_nothing2" function I attached to). The other fields in the tables
are not really interesting (look at the source code to see what they are;
I may fix the output to make it more self-describing). The output from
/proc/dtrace/fasttrap is three table dumps (the header line above does
not reflect that).
<p>
Once I had this "view" of what the provider was doing, I could immediately
go fix another issue.
<p>
I had a lot of trouble with killing the dtrace and the kernel panicing.
<p>
Continued in the next blog entry...

</div>
</content>


<entry>
<title type="html">PID Provider: Did you call?</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-07.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/07/index.html</id>
<published>2012-07-27T14:27:33+0100</published>
<updated>2012-07-22T14:40:23+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<p>
<pre>
dtrace: description 'pid5748::do_nothing2:' matched 9 probes
CPU     ID                    FUNCTION:NAME
  1 250496                do_nothing2:entry
  1 250497                    do_nothing2:0
  1 250498                    do_nothing2:1
  1 250499                    do_nothing2:4
  1 250500                    do_nothing2:9
  1 250501                    do_nothing2:e
  1 250494                   do_nothing2:13
  1 250502                   do_nothing2:14
  1 250495               do_nothing2:return
</pre>
<p>
Spent a couple of weeks to get this going, and its looking postive now.
Let me recount the issues.
<p>
First, what is the PID provider? What is USDT? What is a normal probe?
<p>
Hopefully, we all understand a normal probe - its effectively
a breakpoint placed in the kernel, eg via the FBT provider.
(Syscall tracing doesnt rely on breakpoints, but thats doesnt really matter).
When the breakpoint is hit, dtrace maps that to a user space caller
who is waiting for the event.
<p>
USDT is similar to the normal in-kernel providers, but they occur in
user space. They occur because someone put the probe in their code
(e.g. in the interpreter loop in Perl or Python, or in malloc() in libc).
Because dtrace for linux isnt widespread, few if any, apps have user
space probes.
<p>
The PID provider (also known as "fasttrap", for historical reasons, and
the name of the source code file for it) is very similar to USDT.
But instead of manually littering source code with probe points,
a user can drop a probe into a running process, e.g. 
<p>
<pre>
$ dtrace -n pid1234::malloc:entry
</pre>
<p>
Along the way to getting the PID provider working, I found some interesting
things. 
<p>
First, although there is a lot of code in libdtrace to handle supporting
the PID provider, it is actually a lot simpler than I thought. The act
of placing a probe requires finding the address in the target process.
Once located, a breakpoint instruction is placed at the probe address, or
addresses. (PID provider lets you instrument the entry, return, or any/all
instructions of the target function; in fact, its very similar to the
INSTR provider).
<p>
dtrace itself doesnt need to ptrace(PTRACE_ATTACH) to the process except
to gain write-access to the target process. On Solaris, using the
/proc subsystem, ptrace() is not used. (Solaris allows two or more
processes to debug a process at the same time, or, at least read/write
memory and control the process; ptrace() does not). Although libdtrace
in my release uses ptrace(), this is a limitation, which I plan to remove.
The reason is that you cannot use dtrace to probe a process running under
a debugger; Solaris lets you do this. Its a silly limitation of DTrace/Linux.
<p>
Another thing I found out which is very interesting is that you 
*CANNOT* do the following: put a probe on, for example, malloc, for
every process in the system, including those which have yet to be created.
If one examines:
<p>
<pre>
$ dtrace -n fbt::function:entry
</pre>
<p>
for probing the kernel, you get a hit no matter which process or interrupt
causes the function to be called. But there is no syntax to support something
like:
<p>
<pre>
$ dtrace -n pid*::malloc:entry
</pre>
<p>
When using the pid provider, we specify an actual PID, not all PIDs.
Architecturally, dtrace cannot support this. When you put a PID probe
in place, dtrace creates a new probe out of thin air - its automatically
registered in the kernel. These "auto" probes are removed when the
target process or dtrace terminates.
<p>
I will continue this entry in the next blog piece...

</div>
</content>


<entry>
<title type="html">DTrace PID Provider</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-07.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/07/index.html</id>
<published>2012-07-27T14:27:33+0100</published>
<updated>2012-07-15T22:29:40+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I've recently been working on the PID provider, because, I had
been putting it off, and people were asking about it.
Most of the code is there, care of the original DTrace code, but
it doesnt work.
<p>
As I dive in, and its a deep and scary place to investigate, its slower
becoming clearer to me how it works.
<p>
The PID provider essentially does for processes, what normal dtrace
probes do for the kernel (very similar to the FBT provider).
<p>
Theres a number of ways to come at the PID provider. Firstly, you could
be launching an executable from inside dtrace, or, you could be targetting
a specific running process, or you could be looking for any process which
hits, for example, a libc call. 
<p>
This is all very expensive - instead of dealing with the kernels symbol
table, which, although large, is generally smaller than most executables,
and relatively unchanging. To get the correct symbol table of a running
process involves examing /proc/pid/maps, to find the mapped libraries, and
then examining the process memory, to find the symbols of interest.
<p>
Lets take an example:
<p>
<pre>
$ dtrace -n pid1234::malloc:entry
</pre>
<p>
We locate the process (pid 1234), find the library where malloc is located,
find the address of malloc() inside the library, and then we *patch it*.
The malloc entry instruction is replaced by a breakpoint instruction.
Very similar to the kernel.
<p>
But, before we do this, we need to tell the kernel that this breakpoint
is a DTrace probe, which is handled by the fasttrap.c and fasttrap_isa.c
code. Whilst the above dtrace is running, you can see this "on-demand"
probe by examining "dtrace -l" or looking at /proc/dtrace/fbt.
<p>
Now - a number of things can happen: dtrace terminates or the process 
terminates. If the process terminates, we need to rip out the probe, since
dtrace has played with and knows this probe exists. The fasttrap provide
intercepts fork/exec/exit system calls and should undo the placed
probe.
<p>
If dtrace exits, it should undo the patch to the process binary and
restore the original instruction, and, remove the fasttrap/pid provider
probes. (Confusingly, the fasttrap.c code contains the USDT and PID
providers - they are nearly identical, the difference being that for
USDT, the process places its own probes, but for PID provider, a
copy of dtrace [i.e., another process] is doing it).
<p>
So far, so good. My DTrace has had a number of bugs in the libdtrace code
(not quite fixed, but getting there), which affected ability to find and
place the probes. We can now place the probes, and its possible to see
this happen. In the above example we used "entry", but we could have
used "return", or just left the last field blank. (In which case every
instruction of the function is defined as a probe point - a very
good, but eventual, test case).
<p>
So, eventually the application will hit the breakpoint, and the INT3
breakpoint handler will ask the fasttrap provider to handle the breakpoint.
<p>
At this point, things get a little confusing. DTrace is *not* using
INT3, but INT 0x7E. INT 0x7E is a two-byte instruction vs INT3 which is
a one byte one. DTrace (in libdtrace and fasttrap_isa.c) goes to great
lengths to handle this by emulating the instruction which was overwritten.
(This in fact was my original approach to single stepping the kernel,
but gave up as being too hard and a pain to debug; INT3 is a single
byte instruction so its easier to step over the instruction. But
we mustnt temporarily reset the instruction to step over it, because
another thread might hit the same instruction whilst we have a
temporary instruction in place and miss the probe).
<p>
Lets go over this again: if we overwrite a user instruction, then
we need to do this with a single byte trap (INT3), because if we dont
and the target instruction is one byte long, then we can corrupt the
subsequent instruction.
<p>
We have to be careful that this process may have multiple threads, running
on different CPUs at the same time, who may hit the affected probe point.
<p>
I note that the Solaris dtrace code distinguishes an entry point from
a return point and uses different INT traps to affect this. At present
DTrace for Linux isnt supporting these traps, but now I have uncovered
them, I need to understand more about *why* different trap types are
used. From my work on the original kernel code, INT3 seems sufficient
for all types of traps. (There is a potential issue that if we attach
to a process in user space, which is being debugged, that we can get
confusion about whether the INT3 is for dtrace or for the debugger).
<p>
Theres some other problematic areas; dtrace locks the process
at certain key points, to avoid race conditions which could cause
trouble (eg we mustnt allow a "kill -9" from someone else kill
the process we are trying to instrument). Dtrace for Linux is
keeping shadow data structures for processes, which the real kernel
knows nothing about. So, again we have to be careful that we keep
the "mirage" effect of security and safety.
<p>
I am going to fix the known areas at issue - I have already
demonstrated (to myself!) that we can take a PID provider trap;
releases up until today nearly do that, but they are missing a few
fixes which I will release when I am happy the next release is
better than what is available on my site and github. Hopefully,
a few days away.
<p>
First, I need to fix /proc/dtrace/fasttrap - I want to dump
out the key internal data structures, mostly to prove to myself
I understand them - the output will show the tracepoints
and PIDs being monitored, but at the moment, they are deficient since
they only show the USDT placed probes.
<p>
<p>

</div>
</content>


<entry>
<title type="html">new dtrace release - pid provider</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-07.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/07/index.html</id>
<published>2012-07-27T14:27:33+0100</published>
<updated>2012-07-11T23:34:59+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Started trying to figure out why this doesnt work.
After a lot of user land deep diving, I found a real
stupidity - I forget/didnt realise to implement
the ioctl() interface to /dev/fasttrap.
<p>
That is where the PID provider interface comes into
life !
<p>
Theres a few other bug fixes in the area of userland
dtrace getting to the PID provider, but it still doesnt
work, and at least now, I can start to debug the missing
piece(s).
<p>
If you look at /proc/dtrace/trace, after invoking the
PID provider, you will see something like:
<p>
<pre>
#0 4753:ESRCH dtrace_ioctl pvp=00000000
#0 4753:fasttrap_ioctl:2238: here
#0 4753:fasttrap_ioctl:2238: here
#0 4753:fasttrap_ioctl:2238: here
#0 4753:fasttrap_ioctl:2250: here
#0 4753:fasttrap_ioctl:2275: here
#0 4753:here in fasttrap_add_probe
#0 4753:fasttrap_provider_lookup: here
#0 4753:prfind 0
#0 4753:prfind:find_get_pid couldnt locate pid 0
</pre>
<p>
This means we got as far as the driver, but theres a couple of blips.
One is that libdtrace is passing down PID#0 no matter what PID you
specify (something in the translation from Solaris /proc handling
to Linux, where I have forgotten to set the PID), and the
other is "prfind()". 
<p>
"prfind()" is the Solaris kernel function to lookup a process
by PID. That needs to be mapped to the Linux interface, but, the
code calling this relies on the Solaris "struct proc" layout and fields,
so that code has to be walked through to do the right thing.
<p>
Once this is resolved (and theres some complexity in scheduler and
process locking, since Linux/Dtrace doesnt modify kernel code), then
hopefully the PID provider can spring into action.
<p>
This is actually pretty good progress - having spent a long time
trying to get ELF handling working, I hadnt realised that some
code from near day-0, was a "TODO" item.
<p>

</div>
</content>


<entry>
<title type="html">Excuse me?</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-07.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/07/index.html</id>
<published>2012-07-27T14:27:33+0100</published>
<updated>2012-07-08T10:57:51+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I dont know if people are familiar with the comedy sketch
by the "Two Ronnies", but theres a very funny item where, in a TV
Quiz, the topic chosen by the contestant is to answer the question
before last.
<p>
http://www.youtube.com/watch?v=y0C59pI_ypQ
<p>
So, I am adding TCP provider probes, and this is what happens on
a connection (to an unopen port on the destination host):
<p>
<pre>
$ build/dtrace -n tcp:::
dtrace: description 'tcp:::' matched 4 probes
CPU     ID                    FUNCTION:NAME
  1     54                 :connect-refused
  1     51                    :state-change
  2     51                    :state-change
  2     53                 :connect-request
</pre>
<p>
The connection is refused, and a little while later, we make the connection
attempt. Something very strange is going on. Note that this happens on
different CPUs, so its possible that there is an ordering problem between
the CPUs, but that shouldnt normally happen.
<p>
Definitely some form of timing issue. If I connect to a remote or 
non-existant host, then it looks much "saner".

</div>
</content>


<entry>
<title type="html">dtrace progress</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-07.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/07/index.html</id>
<published>2012-07-27T14:27:33+0100</published>
<updated>2012-07-05T22:59:02+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Just a note to say I am seeing an uptick in dtrace - which is good.
Unfortunately, some of the support issues are causing me to deeper dive
the issues, so, please bear with me if I am being unresponsive, or erratic
in responding or fixing.
<p>
I am trying to get dtrace enhanced in some areas - but also having
to revisit some of my ugly code or hacks - more a factor of the Linux
kernel evolving. What works for todays kernels may not be true for
older kernels, and its difficult trying to be careful not to break
old or new kernels. I dont like #ifdef spaghetti, but sometimes my
interpretation of the kernel evolution, is mistaken, and some code
rot creeps in.
<p>
Just as a minor update, I am adding a little more TCP provider support.
(Just added tcp::connect-established, for instance). More work is
done to mirror all the TCP probes, as documented here:
<p>
https://wikis.oracle.com/display/DTrace/tcp+Provider
<p>
and to eventually support the callback arguments. 
<p>

</div>
</content>


<entry>
<title type="html">Being stupid. Utterly.</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/06/index.html</id>
<published>2012-07-27T14:27:33+0100</published>
<updated>2012-06-30T22:12:15+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Been in the market for a 27" iMac. New imacs are just around
the corner (I hope/believe) but I dont need the top end or latest,
I just want value for money.
<p>
Monitoring prices on amazon.co.uk is interesting - prices go up and down
and resellers will often sell a "used" or "refurbished" item for more
than the "new" price. Very strange.
<p>
Out of boredom, I have been watching amazon.de, amazon.fr, and amazon.it.
<p>
Interesting watching the different prices and the different 
used/reseller markets.
<p>
Bang! Spotted a bargain on amazon.it. A brand *new* 27" iMac
at better than my target price point. Placed the order. Feeling happy
with myself.
<p>
(I have tried to place orders with resellers on amazon, and so far,
for iMacs, none have been accepted; either these are scam artists or
Amazon is allowing multiple orders to be placed when the seller only
has a single unit).
<p>
So, I am feeling real proud of myself. (My Italian is very poor - but
enough to know what buttons to press!).
<p>
Then it dawned on me. 
<p>
What *exactly* had I purchased? It sure wasnt a 27" iMac. It was the
21" model. Annoyingly, the layout of items on the different Amazon
stores is different and even the large/small pictures of iMacs are
used inconsistently. Every Amazon, except amazon.it, lists the screen
size in the description. Not amazon.it - you have to pore over
the technical description to see the telltale 1920x1080 screen resolution.
<p>
Oops! But I just placed the order? Panic !
<p>
Luckily - and I love Amazon for this - you can cancel an order, within about
30mins of placement. So, I did this. Being *very* careful to try and
understand what the screen phrases said (my Italian is not good enough
to handle the subtle language in this area).
<p>
Fortunately, translate.google.com was my friend. Very helpful
to paste key sentences and phrases into the translator and find I had
hit the right buttons.
<p>
So, there you have it. A total buffoon. Io sono stupido!

</div>
</content>


<entry>
<title type="html">The Heat is on</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/06/index.html</id>
<published>2012-07-27T14:27:33+0100</published>
<updated>2012-06-30T22:04:05+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
How do you tell what a system is doing? Or, what tool do you turn
to to see what is going on or slowing down your system?
<p>
Most people probably turn to "ps" or "top". Most people (who
use the tools) understand most of the data displayed, but probably
not all.
<p>
I have created "proc" - a top-like tool to show lots of graphs and
key data from a Linux system. Dtrace can expose tons more data
(if you know where to probe).
<p>
But, paradoxically, the more data you can see, the more rarely you
actually use the tools. ("proc" provides views of data from the
/proc filesystem - this filesystem contains huge amounts of interesting
data).
<p>
Q: What is the one "true" data point?
<p>
A: Heat (temperature)
<p>
When I look at my iMac (I use the excellent iStat tool, which puts
little temperature graphs on the menu bar) - its the temperature which
tells me everything. High temperature means the system is busy.
<p>
Strangely, although Linux has a lot of measurements, there is nothing
which corresponds to the CPU temperature (so far, as I have found).
I have used lm_sensors and psensor and a bunch of other tools, but there
is no CPU temperature (there probably is, but I havent found it, and
its not easy to find it either).
<p>
Heat == Power. Power == $$$. So, more heat, more $$$ (watts) being
expended. And that is a very good average of what your system is doing.
(All the other stats simply provide fine grained data on subsystems,
whether CPU, Graphics, HD, or other motherboard sensors).
<p>
In looking at the 27" iMacs - they are rated at ~360W of power. That
is a lot. That includes the screen, GPU and CPU. Most of the time,
many Macs are going to be idle. (My iMac hits 90+C during heavy
duty operations, such as media encoding). I hear lots of reports
of "hot" iMacs as being normal. In fact, the Macs (and my laptop i7
CPU) are rated for up to 100C operation; at 110C, they will shut themselves
down.
<p>
Thats a *lot* of heat.
<p>
Strangely, one cannot tune a system based on heat. When I use my
laptop, and its heavily compiling or number crunching, it gets hot.
The fan speeds up, and it gets noisy. I may pause the operation - I hate
to think what my laptop would be like if I allowed it to max out for
very long periods of time.
<p>
Wouldnt it be nice if you could get a Watts or $$$ figure out of "ps"?
<p>

</div>
</content>


<entry>
<title type="html">DTrace for RaspberryPi - first problem</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/06/index.html</id>
<published>2012-07-27T14:27:33+0100</published>
<updated>2012-06-23T23:13:00+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Oh dear. /proc/kallsyms doesnt exist. Thats a fundamental
issue straightaway. That means we have no introspection into the
kernel.
<p>
Perfectly understandable, as /proc/kallsyms uses a sizable chunk
of memory.
<p>
Need to find a workable workaround. Maybe later kernels (wheezy?)
will enable it, or may have to go and create a custom kernel.
<p>
Ok, heres the link for building your own kernels - which
packages you need, and Ubuntu cross-compilation:
<p>
http://elinux.org/RPi_Kernel_Compilation

</div>
</content>


<entry>
<title type="html">Hard Projects</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/06/index.html</id>
<published>2012-07-27T14:27:33+0100</published>
<updated>2012-06-23T22:37:22+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I have a choice of projects to select next. Having
recently fixed a tidy sum of bugs in CRiSP and fixed a few
long standing issues/niggles of mine (xxx.yyy++ anyone?!)
I need to get the next projects off the ground.
<p>
Heres the options.
<p>
UTF-8 in CRiSP. CRiSP supports UTF-8 but strangely its not
as natural as I would like/expect. Things are complicated
because multiple things need to be supported (cursor parsing
and treating UTF-8 as single chars, display, display in char mode,
X11, Windows and MacOS). I need to work out how it currently works
or does not work and go fix.
<p>
DTrace. Yes, its time to get back on this hobby horse. Theres
two immediate avenues of research -- the PID provider - figure out
what is breaking in the user space; and the other is ARM support. Now I
have a RaspberryPi, there are a few challenges ahead.
<p>
<b>ARM Challenges</b>
<p>
I'll briefly summarise what needs to be done to the code for ARM.
Initially the goal is to just target the RPI - I dont have enough
ARM devices to toy with, so it sets a baseline.
<p>
Firstly, we need an instruction decoder for ARM. The one
for the Intel instruction set, mostly courtesy of Sun, is obviously
useless for ARM. If I am lucky, the instruction decoder is simple
for ARM, since all instructions are 32-bit (are they?)
<p>
Next, much of the code assumes we are i386 or amd64, and thats no
longer true; so, even compiling as ARM is going to require various
cleanups and tweaks.
<p>
Lastly, building on the RPI itself is going to require one
or more kernels. At least I need the kernel sources, but it
may well be that I need to cross-compile - the 256MB RAM may be
too low for dtrace to compile - I hope not.
<p>
But the last stumbling issue is 256MB of RAM is very puny. I think
the smallest VM i have tried is 384MB of RAM. Although dtrace isnt
very big, it can use quite a chunk of memory for per-cpu data
structures, and this could leave too little for the rest of the system
to work. So, I may need to turn off the instr provider and try and
be very feeble in memory requirements.
<p>
<p>

</div>
</content>


<entry>
<title type="html">A Bad API: XtAppAddTimeOut</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/06/index.html</id>
<published>2012-07-27T14:27:33+0100</published>
<updated>2012-06-17T18:35:36+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Thought I would recount tails of woe with the above API. This
is the API used for timer callbacks in X Windows. You can
define a function to be called after N milliseconds, and you
can use XtRemoveTimeOut to remove the timer. When the timer
is invoked, the timer is automatically deleted, and there is no need
to remove the timer.
<p>
This API is used in CRiSP, and has been for around 20 years.
Recently, I encountered a strange bug, which was annoying me. The
flashing cursor would periodically stop. At first I thought it was
a regression in some aspect of performance or an issue with one of
the newer features, but it wasnt. It *was* being tickled by the new
features, but they were not directly responsible.
<p>
Lets consider malloc() and friends. People who use malloc() (or new[]
for the C++ folks), know that you can free memory and two types of
problems present itself: (a) using memory after it is freed, and (b)
forgetting to delete a memory block, leading to a memory leak.
<p>
Now, the X11 timer API is similar to malloc. If you fire a timer,
it has a finite life, and if you end up with multiple timers for the
same callback, they will all fire. This can cause issues, such as
"frantic cursor" flashing, or whatever the code is which handles the
callback. Its typically easy to detect this scenario, and callbacks
will often have preventative measures to avoid core dumps which could
be caused in this kind of scenario.
<p>
Now, XtRemoveTimeOut is particular nasty. The current X window implementations
tend to reuse an internal timer structure. You can do this:
<p>
<pre>
	XtRemoveTimeOut(id1);
	...
	XtRemoveTimeOut(id1);
</pre>
<p>
and although the second XtRemoveTimeOut is redundant, it can have
a strange side effect. If the code between the first call and the
second calls XtAppAddTimeOut, then the memory freed for the original
timer is reused by another timer. The second call to XtRemoveTimeOut
then removes the timer for the "other code". We may have taken
away someone elses timer.
<p>
This is what was (erratically) happening in CRiSP. Multiple calls
to remove the same timeout were not protected, and this lead to
a piece of code removing the timeout for another piece of code.
<p>
CRiSP doesnt use many timers, but one is tied to cursor flashing,
and if that gets removed, it will never fire again. So, the cursor
stopped flashing. (It would flash if you typed in as the
screen display code would need to hide/unhide the cursor, but it
wasnt obvious this was happening).
<p>
I ended up debugging this by adding an LD_PRELOAD trace library
to observe the "double-free" scenario, and eventually found a style
of coding that could lead to this (and, in v11.0.7, is fixed).
<p>
Strangely, I had hit a similar problem on MacOSX, where CRiSP
implements the X11 primitives as a layer on top of the Cocoa interface,
but hadnt noticed the same issue on X11, as it took a number of
events, in the right order, to reproduce the scenario.
<p>
XtAppAddTimeOut()/XtRemoveTimeOut() need to either not reuse
memory or provide a debugging API to detect cancels of freed timers,
and avoid timer reuse.

</div>
</content>


<entry>
<title type="html">2560x1440 $300USD</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/06/index.html</id>
<published>2012-07-27T14:27:33+0100</published>
<updated>2012-06-09T18:02:37+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
As I sit here, debugging an issue with node-locked licenses, on almost
the worlds smallest computer (Raspberry Pi), I stumbled on the 
following link on slashdot:
<p>
<a href="http://hardware.slashdot.org/story/12/06/09/0117208/where-are-all-the-high-resolution-desktop-displays">where-are-all-the-high-resolution-desktop-displays</a>
<p>
One of the best looking machines available today is the iMac. Its an
all-in-one device, and as I write this, hopefully Apple will release
new devices at next weeks WWDC. A 27" screen with builtin computer,
or an overpriced computer with builtin screen. Less cables and sockets
to contend with.
<p>
Laptops have stagnated at the silly 1920x1080 resolution so I read
the above article, totally agreeing how the computer market as degraded
into a Pop-Idol "me-too" kind of world. Laptop screens peaked at 1920x1200
and then went south, presumably due to the cost reduction by sharing the
LED/LCD TV market.
<p>
iMacs are expensive, as are all Apple products. Sure, there is
an equivalent HP or DELL contender, with a whopping 2560x1440 screen,
but they are largely overpriced. One can buy a DELL UltraSharp U2711 or
Apple Cinema display, but at around 800 GBP, by the time you factor in
a decent computer, and mouse/keyboard, you are not far off the Apple price.
<p>
Now, on reading the slashdot article, I was staggered/amazed. On that
page are references to *tons* of 2560x1440 displays, presumably
coming out of the same Asian manufacturers as the genuine Apple/DELL
displays, but with variations (low cost connectors).
<p>
Approximately 200 GBP, or $300 USD. Thats the cost of the largest
display you can buy today (largest == large screen, very high
resolution; otherwise a 1080p 50" or 60" TV could be considered
"largest").
<p>
That is shockingly cheap. And yet no one, apart from Apple/HP/DELL,
let you know these displays are available. Even the component
sellers and techno-gadget pages (Engadget, TheRegister) make any
references to these.
<p>
So, one could spec up an iMac-alike machine for close to half price
(not as ergonomically desirable as the iMac, but at least you can
select and change components cheaper and more easy).
<p>
Heres an ebay search to show you whats available:
<p>
<a href="http://www.ebay.co.uk/sch/i.html?_from=R40&_trksid=p5197.m570.l1313&_nkw=2560x1440&_sacat=See-All-Categories">Ebay 2560x1440 monitors</a>
<p>
<p>
<p>

</div>
</content>


<entry>
<title type="html">CRiSP On Raspberry Pi</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/06/index.html</id>
<published>2012-07-27T14:27:33+0100</published>
<updated>2012-06-07T23:40:50+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
I finally received my RaspberryPi today (heretofore, RPI). RPI
is a nice little device. Shame it doesnt presently come boxed, but
future editions will do so.
<p>
Its also a shame (or good?) that RPI now has some competitors -
USB flash drive self contained computers. For now, these are vaporware,
and, until recently, RPI was vaporware too.
<p>
I quickly got it up and running - one advantage of not getting a RPI
on the day of release, is that the web is now chock full of tips
to getting a RPI working.
<p>
Having gotten the device, I am getting a 64GB SD card - the 4GB one
I am using lets me get off the ground, but project#1 is to connect
the RPI to active speakers and use it as a music device.
<p>
Alas, my cheapo USB wifi dongle is appearing flakey, so am going to 
get another one, and hope this works. (Presently, the existing dongle
loses the connection after a few minutes, and a reboot or pull-out / 
plug-in, is required to restore sanity).
<p>
I am trying to get CRiSP built/installed on the device - the first
new "CPU" port of CRiSP for quite a few years (the last was
for Itanium). This is proving pretty straightforward, but the wifi
is making life a pain (ie go sit in front of the TV/screen).
<p>
I am planning to release CRiSP for RPI as a free product - no licensing,
just to give "something back to the community". It will appear on the
crisp download page (http://www.crisp.demon.co.uk/download.html) in a few
days when the port is ready.
<p>
If I can clear the backlog of bugs and issues, then I may take a poke
at looking at dtrace for RPI. This in theory should be straightforward
but it will be painful to compile a kernel on the fairly feeble 700MHz
CPU, so I may have to look at cross compiling. The 256MB memory of an RPI
may prove a limitation too.
<p>
No dates on dtrace - purely we will "see".
<p>
I have some potential other "projects" to do on the RPI. First is
the music server, second is a video server - ideally on the same physical
HW as the music server, but I dont think this is doable with only
one audio out (although it might be if sound solely comes from the external
speakers...need to experiment).
<p>
The other project is to replace the very aging G3 imac which serves
as the CRiSP FTP repository - a simple ftp/web server. This should be
very untaxing - but ideally I want some cased RPI's before attempting
this.
<p>

</div>
</content>


<entry>
<title type="html">Being lied to. For 20+ years</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-06.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/06/index.html</id>
<published>2012-06-06T10:29:26+0100</published>
<updated>2012-06-04T22:03:49+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
About 20+ years ago, CRiSP for Windows was written (or, more correctly, ported).
Back in early 1990's, a new operating system - Windows NT 3.0 was arriving. I remember
it - I had to buy a new machine - a fabulous 486DX2, running at 50MHz. 
<p>
At last, a version of Windows which didnt crash. Windows NT 3.0 was a 32-bit operating
system. CRiSP has been compiled for Windows 3.1 - a 16-bit operating system. Porting and debugging
CRiSP was painful - any bad behavior would likely require a machine reboot - Windows 3.x was
too unstable; errant applications could write anywhere.
<p>
NT 3.0 was protected from this nonsense. CRiSP exists as two main versions - a console
version, and the GUI application. (This is true today, not only for Windows, but for Unix/Linux
and MacOS).
<p>
Whats the difference between a GUI application and a console application? "main(int argc, char **argv)".
<p>
Well, Windows has a different startup function - WinMain. WinMain is a bit like the function invoked
before main() is invoked. It doesnt get an array of command line arguments. It gets a single
argument for the command line, and its up to the application to parse the command line.
<p>
All the CRiSP tools (and all tools, even non-Foxtrot ones), parse that command line.
<p>
An annoying problem is that CRiSP relies on printf() for debugging and for some of the macro
commands. When you link a Windows application, you use a different command line - to signify its
a GUI applications ("link -subsystem:windows,4.x" or equivalent, depending on the version of Windows
you are targetting).
<p>
By contrast, a console application is very POSIX like in its behavior. printf() writes to the
console (cmd.exe) you invoked from and you can pipe the output.
<p>
I recently started using MINGW (http://www.mingw.org/) - a port of the GNU compiler collection
to Windows. MINGW is different from CYGWIN which provides a Unix/POSIX like system under Windows.
MINGW can generate Windows applications - so, no need for the SDK. (MINGW is simply brilliant;
I'll explain why, below).
<p>
In porting CRiSP to run under MINGW, I ended up building a "premake" like build tool,
because the Windows and Unix makefiles had grown too long in the tooth to easily adapt. After
building CRiSP under MINGW, I did something *wrong*. I built crisp.exe as a console application.
I didnt realise this. And was surprised that printf() was writing to the console.
<p>
Up until now, CRiSP has had to emulate the console, and writes to a popup dialog. Its not a bad
way of debugging, but a nuisance, despite some nice little features which help me.
<p>
But why was MINGW crisp.exe writing to stdout quite happily, yet the Win32 version of CRiSP.EXE
did not? I had attempted to solve this problem many years ago, and found that somewhere in the 
Windows startup code, the STDIN/STDOUT/STDERR handles are closed and not available - by the time
WinMain() is called, it is game over.
<p>
But when crisp.exe is linked as a console application, this does not happen. stdin/stdout/stderr
are left intact. So, a GUI application can read/write to stdin/stdout !
<p>
I dont know why all the Windows documentation makes a big play about the linking "subsystem", but if
you ignore it, life is more palatable.
<p>
Why is MINGW so good? Because "gdb" just *works*. I can use gdb on Linux, MacOS and Windows and have
the same debug environment. Even hardware watchpoints work. gdb may not be everyones favorite debugger,
but it is might powerful.
<p>
Prior to this I was using the free Visual C++ Express edition. (I had purchased the Professional
Visual Studio a long time back, but Visual Studio, despite being a very powerful product, just
changes too often with whatever current flavour of technology is current, and its not cost effective
for software which runs cross-platform). With the advent of Windows-8, its not clear whether 
Microsoft is trying more to create a walled garden, like Apple is/has done.
<p>
So, the GNU compiler collection is great - providing a consistent compiler platform across
many operating systems. MINGW fills in a gap - which was how to use GCC to create Windows applications.
<p>
Currently CRiSP is being built via Visual Studio, and MINGW is only being used for internal
debugging, but this may likely change in the near future.
<p>
<p>
<p>

</div>
</content>


<entry>
<title type="html">Amazon Reseller Scams - and iMacs</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-05.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/05/index.html</id>
<published>2012-06-06T10:29:26+0100</published>
<updated>2012-05-12T22:34:59+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Thought I would recount something I have been investigating. It
may help someone out there.
<p>
I have an interest in purchasing an iMac - the 27" screen is gorgeous
and you can never have enough pixels. Remarkably, the iMacs hold
their prices extremely well - just look at Amazon or eBay to see this.
The difference between New or Used is a small percentage (20-30%).
Even the older iMacs (2009 and 2010) models hold their value well.
<p>
Apple appear to allow approx a 9% discount, e.g. from Amazon, over the
list price. You can buy a refurb or reboxed product, and get a good
discount (both from Amazon, and other UK outlets).
<p>
A "refurb" model can mean anything: looking at some of the UK outlet
stores, they occasionally do refurbs, but, some of these stores have
display models, pretty much on 24x7, and its not clear if a "refurb"
is just a boxed version of a display model, whose screen and hard drive 
have been powered on for long periods.
<p>
Now, if you monitor Amazon (UK - I havent checked US + Europe), you
will see occasionally real bargains popup, e.g. instead of a 20-30%
discount, you see 50%: the cheapest iMac model is 1400GBP, but to see
one at 700-800GBP is a warning sign.
<p>
In each case, the marketplace seller is a new seller - no feedback,
and in some cases, they are listed as a US address, and other times
as a UK address, even using some other company as a front.
<p>
Now, for the start of the scam: if you try to order the product,
you mostly can not: the reseller doesnt ship to the UK (or anywhere else) -
ie they are selling something you cannot buy.
<p>
Now, in some instances the seller advertises to contact direct,
and if you do, they will offer to enter the order for you, if you
provide them with details. This seems extremely suspect and I raised
it with Amazon. No harm done - as the order didnt complete.
<p>
Now, I dont know if this is the same reseller or a gang of them. Today
I ordered, and the order was expected - got an email from amazon
confirming the order was placed. Later on in the day, the order
had mysteriously disappeared from my orders in Amazon. Again, I contact
Amazon (via their live-chat feature) and the person there suggested
that this can happen - reseller cancels orders, and my details should
be safe.
<p>
Just to summarise:
<p>
<ul>
  <li>Reseller has no feedback and is a new reseller</li>
  <li>Item is too good to be true</li>
  <li>Seller may not accept delivery to your address or country</li>
  <li>If you contact reseller (either direct or through Amazon), and
  they want to be helpful and ask for details so they can enter the
  order for you</li>
</ul>
<p>
I hope the above is obvious to everyone, and if you learned something,
then great. Amazon is a great place to shop, and seems very secure with
good customer relations and security practises. But there are lots
of people out there who are trying all sorts of scams, and you dont know
what they are or how to spot the signs. (I am simply looking at this
to better understand myself).
<p>
Be careful out there.
<p>

</div>
</content>


<entry>
<title type="html">Are your backups *good*?</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-04.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/04/index.html</id>
<published>2012-05-01T21:47:34+0100</published>
<updated>2012-04-22T16:54:30+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Wanted to recount a little scenario that happened to me, and a warning
to others.
<p>
I've been writing a "makefile" generator tool. The makefiles for CRiSP
have become a little unwieldy over the years - specifically, parallelisation
issues, incomplete dependencies, and inability to adapt them quickly
to a new scenario.
<p>
CRiSP runs on MacOS, Linux and Windows. The Mac and Linux makefiles are
shared (as are all the Unix makefiles), with an auxiliary small makefile
to handle the Cocoa part of the build cycle. Unfortunately, for Windows,
nothing is shared - it has a complete set of its own makefiles.
<p>
I wanted to compile CRiSP under MingW - a GCC port to windows, because I
wanted to use GCC, and avoid issues with Microsoft Visual Studio.
MingW provides a Unix like environment, but Windows is too far from Unix
and MingW is too far from Windows, in terms of my makefiles.
<p>
So, I built a makefile generator. It has some good features - auto/recursive
header file dependencies, avoiding "cd" and creating a parallelisable
tree, along with proper cleaning - clean only what is built.
<p>
That latter feature had a bug in it: the initial version did the
equivalent of "rm -f */*". Unfortunately, when I ran it, it hadnt
cd'ed to the build directory, so it deleted all my files in subdirectories.
Oops!
<p>
Not to worry - I do backups every few days and propagate the sources to 
other machines. Mount the USB drive and go look at the .tar.bz2 file 
containing the current sources. (Alas, about 2-3 days old, but that
didnt matter).
<p>
What I found was a corrupted .tar.bz2 file. My initial thoughts were,
*how* did this happen? My backup script is used all the time, and I
have validated the backups, but this was strange and new.
<p>
Never mind, I reconstituted the missing files from my other backups
and systems.
<p>
But I was curious, what could cause this to happen.
<p>
On investigation, I found the following worrying sign. I tend to backup
to a USB flash drive. I use Linux. I use swsusp to suspend to RAM. My current
kernel/distro has a bug in it. When you suspend with mounted USB drives, on
a wakeup, it doesnt understand the filesystem was mounted or the hardware
needs to be reprobed. "mount" would show the filesystem mounted
but the device was totally empty. If I unmounted the rogue device, and
remounted, all the files were present.
<p>
I am guessing that I did a backup and left the device mounted, suspended
the system, and didnt notice for a while (1-2 days) and eventually this
may have lead to the corruption.
<p>
Moral: even if your backup system is perfect and doing validation -
your operating system (or some other component) may work against you.
You may not know this.
<p>
In my case, I may have to strengthen the backup system to consider
applying md5sums to the files, and validating them before writing
to the device, or maybe to cache the backups on HD and verify before dropping the
local HD.
<p>
How good are *your* backups?

</div>
</content>


<entry>
<title type="html">dtrace, ftrace, ltrace, strace .. so many to choose from !</title>
<author>
<name>fox</name>
</author>
<link rel="alternate" type="text/html" href="http://www.crisp.demon.co.uk/blog/archives/2012-04.html"/>

<id>http://www.crisp.demon.co.uk/blog/site/2012/04/index.html</id>
<published>2012-04-14T09:10:30+0100</published>
<updated>2012-04-11T22:08:47+0100</updated>

<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<p>
Vicente-Cheng has been asking questions about some curious kernel
messages when dtrace is loaded into the kernel, and it seems like a good
topic to cover in more detail. So, here is his question:
<p>
<verbatim>
    The dtrace_xcall code patch failed. (Actually, I don't know how you know that from log.)
    by this description?
    "ftrace failed to modify [ffffffffa04cdd44] dtrace_xcall+0x4/0x28 [dtracedrv]"
<p>
    The ftrace module will got something wrong when dtrace kernel module loading.
<p>
    The tainted kernel is normal because of the different license (GPL & CDDL).
</verbatim>
<p>
Lets do a little history lesson.
<p>
In the beginning, Solaris gave rise to a tool called "truss". "truss"
is a tool for tracing system calls. It was a breakthrough tool (versions
for other OS's existed previously, but the truss implementation was simple
and easy to use; BSD kernels had there own tool .. but I digress). truss
lets you understand what calls a tool makes and you can see
the parameters to the syscall. Great for educational purposes or for
diagnosing performance problems in an application.
<p>
On Linux systems (and other Unixes), strace was created to emulate
truss - a system call tracing tool.
<p>
"ltrace" is another tool, which can be used to trace any dynamic
library function call. Sometimes very useful, but at the library
level, the number of calls executed can be huge - something as simple
as starting an X Windows program can involve lots of nested calls
to fopen, string library and other calls - but because of the level
of detail - they can make it hard to understand what an application is
doing.
<p>
A few years back - dtrace was born. dtrace is a tracing tool which
runs at the system level - this means you can trace all processes, into
and out of the kernel, rather than selecting a single process to trace.
<p>
There are two common modes of operation: trace system calls (you
can see who executes the call - rather than with strace/truss, you pick
a single process and see what calls it does), and you can trace
kernel function calls (seeing which process triggers the trace). 
[dtrace does other things, such as tracing user land shared libraries,
just like ltrace, but again, you can find which processes invoke a function
rather than knowing up front, which one to trace].
<p>
When tracing in the kernel (and user space), the basic mechanism is
that of a breakpoint: one or more breakpoint instructions are placed
at the places you choose (using the dtrace probes syntax), and
dtrace converts these breakpoint traps into the events and callbacks that
a D script can trigger on.
<p>
DTrace works at the function level: you pick a function to trace and
dtrace computes the start of the function (very easy - the symbol table
gives us the start address of each function), and also, the (multiple)
end addresses of a function. (Each function can have multiple exit
points, so dtrace needs to "find" each of these). Finding the exit
points of a function involves disassembling the function and trapping
the RET instruction. (Solaris/dtrace traps the LEAVE instruction which
preceeds the RET instruction, but this is not very useful on Linux, when
using gcc, since the compiler rarely emits a LEAVE/RET instruction sequence).
<p>
Its possible for the disassembler to get confused and Linux's dtrace
is tuned for the type of output typical from gcc. It can result in not
finding the end part of a function or getting confused by inlined
constant tables (gcc used to create switch-statement jump tables
and inline them in the code body; fortunately, it no longer does this - all
read-only data is put in its own ELF section, and code in memory is exactly
that -- code -- nothing else).
<p>
dtrace needs to intercept loadable kernel modules - so that it can expose
tracable functions, and, intercept module unloading - otherwise it
could leave a trap exposed in a part of memory that is freed or used
for something else. 
<p>
Lets switch to ftrace: ftrace is a Linux kernel subsystem, similar to dtrace,
and an integral part of the kernel. Because of licensing conflicts between
dtrace, ftrace exists in its own right. It works similarly to dtrace - intercepting
module loads and letting you place traps in the kernel. (ftrace is
different in many respects to dtrace - it provides a richer API
for kernel tracing, but it can allow dangerous user scripts to crash
the kernel).
<p>
ftrace is quite vocal - if it detects anything it cannot handle in a loadable
module, it will log a message - indicating what and why it couldnt
handle the instruction sequence it meets. This results in a kernel log
message, such as the following, when dtrace is loaded:
<p>
<pre>
[  357.679832] WARNING: at kernel/trace/ftrace.c:1509 ftrace_bug+0x28f/0x2e0()
[  357.679834] Hardware name: VirtualBox
[  357.679835] Modules linked in: dtracedrv(PO+) lockd sunrpc be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev snd_intel8x0 parport_pc parport i2c_piix4 i2c_core e1000 snd_ac97_codec microcode ac97_bus snd_pcm snd_page_alloc snd_timer snd soundcore uinput [last unloaded: scsi_wait_scan]
[  357.679855] Pid: 2350, comm: insmod Tainted: P           O 3.3.1-3.fc17.x86_64 #1
[  357.679857] Call Trace:
[  357.679862]  [<ffffffff8105676f>] warn_slowpath_common+0x7f/0xc0
[  357.679866]  [<ffffffff81122876>] ? __probe_kernel_read+0x46/0x70
[  357.679874]  [<ffffffffa0286004>] ? ctf_hash_size+0x4/0x20 [dtracedrv]
[  357.679881]  [<ffffffffa0286004>] ? ctf_hash_size+0x4/0x20 [dtracedrv]
[  357.679883]  [<ffffffff810567ca>] warn_slowpath_null+0x1a/0x20
[  357.679885]  [<ffffffff810f093f>] ftrace_bug+0x28f/0x2e0
[  357.679888]  [<ffffffff810f0e6c>] ftrace_process_locs+0x39c/0x560
[  357.679891]  [<ffffffff810f2767>] ftrace_module_notify+0x47/0x50
[  357.679895]  [<ffffffff815ef34d>] notifier_call_chain+0x4d/0x70
[  357.679898]  [<ffffffff8107e298>] __blocking_notifier_call_chain+0x58/0x80
[  357.679901]  [<ffffffff8107e2d6>] blocking_notifier_call_chain+0x16/0x20
[  357.679904]  [<ffffffff810b5df3>] sys_init_module+0x10a3/0x20b0
[  357.679906]  [<ffffffff815f35e9>] system_call_fastpath+0x16/0x1b
[  357.679908] ---[ end trace 54106b526adf7ab1 ]---
[  357.679909] ftrace failed to modify [<ffffffffa0286004>] ctf_hash_size+0x4/0x20 [dtracedrv]
[  357.679915]  actual: e8:0f:62:03:00
</pre>
<p>
Here, the code in ftrace.c at line 1509, is complaining.
Its not obvious what its complaining about, but the last two lines
highlight what ftrace was confused about. The "actual" message
shows the instruction it didnt like. Those bytes correspond to:
<p>
<pre>
e8 0f 62 03 00          call   0x00036214
</pre>
<p>
I dont know what it doesnt like about that, but it is possibly suggesting
that ftrace, in computing the call-graph of the kernel, may have found
a code path which could result in a kernel hang (this might be
dtrace violating a kernel API constraint, or, because ftrace doesnt
realise how dtrace works).
<p>
To date, I have not observed any side effect of this warning (I havent
tried to use ftrace on a system running dtrace, so its possible that
ftrace or dtrace may cause a hang, when they are enabled, together).
<p>
By the way - always treat kernel stack traces (such as this) or
even from dtrace itself, with suspicion. Because of the way GCC
compiles code, it is not possible to create a 100% accurate stack
trace in all scenarios. The stack walker looks at all the words on the
stack and lists out potentially likely active stack frames. (The "?"
marks in the trace show the stack walker highlighting definitely questionable
stack boundaries; walking the stack has enough gotchas and complexity that
maybe I will return to the subject in a future blog topic).
<p>
<p>

</div>
</content>


</feed>
