|Why I love Xen||Sunday, 23 March 2014|
I think I found the Xen + DTrace culprit. I have a couple of routines in dtrace which invoke the CLI and STI instructions. They are a bit naughty, and designed to shield dtrace from the complex number of enable/disable interrupt primitives in the kernel. (The kernel has multiple layers of interrupt functions and macros - to try and hide the implementation details of each cpu, along with preemption and scheduling functions).
I spent a lot of time thinking about DTrace and Xen - what was I doing wrong. I spent a lot of time not thinking about DTrace and Xen. As much as I wanted and needed help, to find this horrible bug, I couldnt even phrase the right question.
The evidence was clear: occasional single step traps would not properly single step - it was as if the single-step trap flag was not set in the FLAGS register. I tried molesting the code in the interrupt routines - checking very careful the assembler code.
The Linux kernel has two sets of interrupts for many of the core traps (int1, int3 and others). Which one is used depends on whether Xen is active or not. When Xen is active, there are two extra registers on the interrupt stack.
After much playing with the dtrace code and the linkage from dtrace to the underlying kernel...I started to investigate something else.
I noticed the dtrace dumps were caused at the point we switched from one cpu to another. My VirtualBox VM had 4 cpus. So I set it to 1 cpu. DTrace worked remarkably better - but soon enough, we still core dumped on:
$ dtrace -n fbt::sys_chdir:
in one window and
$ while true do cd / done
in the other. The fact that we had a single-step trap, which was not single stepping is really curious. So I started grepping for CLI/STI in the driver code, and came across my old friends dtrace_enable_interrupts() and dtrace_disable_interrupts().
These functions use assembler to execute the CLI/STI instructions. Now, what if we had done the wrong thing? Xen wouldnt really know (except by accident) that we had executed these instructions and we may be running without correct disabling interrupts. Xen is a hypervisor and if it doesnt know whats going on, all bets are off.
So I replaced the CLI/STI with arch_local_irq_restore and arch_local_irq_save, and it works! I had gotten away with this for a small duration and this is the Xen issue (I hope its the only issue). Whilst writing this blog, my test case ran with no problems.
Now I can formalise the change and push out a new release.
I still dont like Xen - I would rather have my VM die and tell me that something is wrong immediately, not lie through its teeth that the guest code is working when it isnt.
Posted at 21:38:34 by fox | Permalink
|Why I dislike Xen||Friday, 21 March 2014|
Fast forward to Xen. Those instructions dont exist - nor any other instruction which can affect the virtual state of the CPU. Instead, the kernels are compiled to call a function via the hypercall portal.
Now, if you do the wrong thing - you are on your own. You wont know what you did wrong. It is so easy to crash the VM *host* if you are not careful.
So, lets look at dtrace. Dtrace appears to work well. But if you are on Xen, as of this writing, the results are erratic - after a few probes, you will crash the process, or the kernel, or the VM.
Heres an example of a process crash:
[ 2390.769639] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 [ 2390.769640] IP: [<ffffffffa01cd193>] cpu_core_exec+0x133/0xfffffffffffbcfa0 [dtracedrv] [ 2390.769641] PGD 17319067 PUD 172b9067 PMD 0 [ 2390.772090] Oops: 0002 [#1] SMP [ 2390.772090] CPU 1 [ 2390.772090] Modules linked in: dtracedrv(PO) nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc psmouse microcode serio_raw i2c_piix4 e1000 [last unloaded: dtracedrv] [ 2390.772090] [ 2390.772090] Pid: 3057, comm: bash Tainted: P W O 3.4.0 #1 innotek GmbH VirtualBox/VirtualBox [ 2390.772090] RIP: e030:[<ffffffffa01cd193>] [<ffffffffa01cd193>] cpu_core_exec+0x133/0xfffffffffffbcfa0 [dtracedrv] [ 2390.772090] RSP: e02b:ffff8800173d5f78 EFLAGS: 00010193
When a dtrace probe fires (because we replaced the instruction with a breakpoint), we have to single step the replaced instruction. What is happening above is that the single step is not honored. The EFLAGS register shows bit 0x100 is set - a single step is in effect. But Xen has decided it isnt. So the instruction at the buffer location continues after the instruction. Heres the disassembly:
(gdb) x/i 0xffffffffa01cd190 0xffffffffa01cd190: push %rbp 0xffffffffa01cd191: nop 0xffffffffa01cd192: nop <=== shouldnt get here 0xffffffffa01cd193: add %al,(%rax) <=== but we died here
Such sweet joy to figure out what "rule" I broke. I had reported in a prior blog entry that the Xen documentation is very poor.
More than likely I am setting the trap flag by bit-mask operation on the flags and not by telling Xen what we did.
Off to find the code which is stopping Ubuntu 12.04 from working reliably.
Posted at 22:24:54 by fox | Permalink
|Malloc(RSS)||Monday, 17 March 2014|
I can successfully report that rss.pl exceeded my desire to go on a diet. Previously I was approaching 500MB/month of bandwidth use and it was out of control, due to the obnoxious way adverts were being generated (they were mutilating the HTML of a page - after I had paid the cost of downloading the page - and effectively paying twice for a page, whilst denying compression). In addition, the mechanism would add significant delay - I was lucky to read 1 or 2 web pages in multiple minutes.
With rss.pl, I do a page refresh maybe 2 or 3 times a day. The single (compressed) page load of ~90kb (~250kb uncompressed) lets me wade through maybe 100-200 news items over the course of a few hours - I can read far more, without annoying delays.
In one month, I hit 60MB of use - I had set myself a diet of 100MB/month but I exceeded that. I would have hit 50MB or lower, but occasionally I use my device for email.
Anyway, I started adding stats to the service. Allows me to fine tune how it works. I noticed other people use it (by all means do - rss.pl - but I reserve the right to take it down or restrict usage if I find any abuse).
One of the things I am debating is how much history to carry. I currently allow for the last 2000 new sites in a cache, but of the 10-12 sites I poll, at varying frequencies, some sites are exposing old items. (I dont use date aging - yet - simply URL history). The working set is about 700-800 news items, so within 2 or so days, I will be exposed to stale news items popping through. Its interesting watching the effect. For example, www.osnews.com is not a fast updating site - maybe 1 item per day, yet my code is allowing multiple old items to pop through. "The Old New Thing" (http://blogs.msdn.com/b/oldnewthing/rss.aspx) is a great site, and updates at one update per day. Yet very old items are peeping through.
Somehow, the algorithm for retrieving sites, and the cycle update from these sites is causing a kind of "shearing" effect - something typically seen with a fragmented heap (hence the reference to malloc) or locality issues in a page swapping algorithm.
Its as if the sinusoidal wave of each news feed is leading to some interesting effects.
I can just increase my history limit - doing so wastes more CPU cycles on my server (so I now need to use a better data structure than what I presently do). Maybe I will relent and do If-Modified-Since requests and use the timestamps to keep everything fresh.
At the moment, I discard older news items and pages I generate, but I may decide to keep them for long term archival - for more data mining expeditions.
Posted at 22:48:08 by fox | Permalink
|Latest DTrace news||Sunday, 09 March 2014|
Of course, when I stop looking at dtrace, after a while, a trickle of things come in. I thought I would summarise some of my "TODO" items.
First, a user reported an issue on a 3.2 kernel. Seems like a Xen hosted kernel is very unreliable. Although he reported an issue with an fbt::: torture test, I could reproduce with a task switch function - even after a few probes, suggesting the basic handling of an IRET instruction may be having problems on Xen. (Xen does paravirtualisation which is another way of saying, all the work is on the user and not itself, to virtualise. I really do not like Xen - its a work generation scheme!)
Next was an issue with the PID provider. I did start working on this and found some missing functionality in libproc/libdtrace. I started working on this, but its not ready for consumption. I have included the partial work in the next release. At issue is running a process and intercepting the shlib events so that dtrace can look at the shlib and add probes.
Next up is the 3.12 kernel. I usually leave it a while before touching new kernels - Ubuntu 14.04 is due in a few weeks and I can kill two birds with one stone by waiting for that to come out.
Another annoyance is the libdwarf vs libdw confusion which prevents ctfconvert from being built. This is mostly irrelevent unless people want to use kernel struct's in their probes. I need to discard both libraries and use readelf and parse the output from it - its the only portable way to do what is necessary vs the confusion inherent in both dwarf libraries.
So, its unlikely I will release a new dtrace release for a few more weeks, unless I magically fix one or more of the issues above.
Posted at 20:43:00 by fox | Permalink
|candycrush - file format||Sunday, 02 March 2014|
Its a nicely put together game - delivering a "Bejewelled" experience which gets you hooked quite early on. Starts really easy and then becomes challenging. You quickly run out of lives. Lives are given back to you at a rate of 1 every 30 mins. If you leave the game for a few hours you can have a few lives to retry the levels again, until you run out.
Great so far. With over 300 levels - the game will keep you hooked for a long time. But paying for the extras and in-app purchases can soon be expensive.
Looking at all the cheats on the web - there seems to be two: one is to use facebook - login/deinstall/reinstall, and you can bump up your lives.
The other is to download a "save" file which is premodified to give you 999 lives and other stuff.
I took a quick look and located that (for Android), the file of interest is something like
where NNN is some unique number for you.
Looking at this file in a binary editor shows some form of binary structure. Given that I was at level 3, the following pattern showed itself:
0002f0: 0003 0000 00c0 1200 0003 0000 00ac 1c00 ................ 000300: 0002 0000 00b4 3c00 0003 0000 003c 8700 ......<......<.. 000310: 0003 0000 0020 f800 0003 0000 0034 fb02 ..... .......4.. 000320: 0003 0000 00ec fe00 0003 0000 00d4 4301 ..............C. 000330: 0003 0000 00ac 3e01 0002 0000 0038 ee01 ......>......8.. 000340: 0003 0000 00a8 5b01 0002 0000 005c 5c01 ......[......\\. 000350: 0003 0000 0008 7d02 0003 0000 0018 c801 ......}......... 000360: 0003 0000 00dc 6c02 0002 0000 0010 0d01 ......l......... 000370: 0003 0000 008c 1103 0003 0000 0014 ed03 ................ 000380: 0003 0000 00e4 9800 0002 0000 0088 2b01 ..............+.
After that is a sea of all zeroes. I modified those "3"'s - at 8 byte intervals, and uploaded to my device.
And yes - it now shows that I have completed (or started) those levels. I am assuming the format here is - number of stars, and a high score account. Not clear what the 0000 is after the stars - maybe the stars are an int (i.e. 4 bytes) and 4 bytes for the score.
I presume theres lots more stuff in there to find, including a timer or other attributes you can earn/pay for.
Posted at 17:40:41 by fox | Permalink
|Google and my rss feed site||Saturday, 01 March 2014|
Its effective because its no frills and supports a zipped payload page - no graphics, no external references, etc. Wow! Is it good :-)
I'll make gradual changes and enhancements to it over time - so if you use it, you may see changes (and feel free to mail me with ideas or questions. There is a definite fine line between bloat and functionality.
Anyway - I am staring at the http access log and notice something very peculiar. This app - by its nature of aggregating news from a few web sites has attracted the googlebot crawler. Great - why not.
But I notice a request come in with a "Referer" link like this:
Interesting. So, someone clicked from google to get to me? I must be famous! Anyhow I play that link into the browser, and I can see - there I am - on page 1 of the results. Neat! Managed to achieve something most people dream about.
But - isnt this a security hole? If I attract these requests en-masse and start collecting them, then I am finding out something very personal from the requestor and their IP address and browser.
Interesting...will have to look more at this to see what is going on. I have no direct use in such data - but of course, everyone out there, coming from google is getting a referer link.
Posted at 21:50:05 by fox | Permalink
|Galaxy Gear||Wednesday, 26 February 2014|
Samsung have really underplayed what this device can do and the XDA forums has lots of stuff to keep hackers happy - from custom watch faces to installing a browser and more.
I write this blog, not because I want to review or praise the device but I just wish Samsung could write software.
For starters - the gear manager app is buggy. Very buggy. It corrupts its own database - I have 4 "Touchwiz Home" apps I can add to my watch and this grows everytime I visit the myapps/add-app menu. And it hangs and crashes.
The Samsung app store hangs if I try to do anything in it - simply appalling. No amount of reinstalling seems to help.
But theres two things that have driven me nuts -
The first was "Weather" updates. Took me a *month* to figure out why it had stopped on Xmas day and not updated in the entire month of January. (Solution: enable location services. Why? WTF).
Ok - thats fixed. I wish the weather app had more than a week of data or could show graphs (like the Suunto watches could years ago).
This one has me beat: the settings app on the watch has gone missing. This is fun. I cant find out how much battery life is left on the watch. I dont know where the settings app has gone (despite restarts; I havent tried reinstalls as yet). What is worse...trying doing a google search for "Galaxy Gear settings". Not a fun way to browse the interweb.
My RSS feed (http://crisp.dyndns-server.com:3000/p.html) is now working nicely - have used less than 20MB of mobile bandwidth in a week, rather than 50-75MB in a day by visiting websites full of bloat.
And am continuing work on the ptrace (strace replacement) as I now have the ability to sandbox the filesystem and manipulate files which apps try to open. I'll update at a later time when more of it is working, but you may see tweets of new releases going out.
Posted at 22:41:38 by fox | Permalink
|rss.pl...continued||Monday, 17 February 2014|
I added compression. Typically this is getting 2.8x compression size. This means the first page - 200kb, comes out at a cost of ~70kb of download. This typically contains the last few hours of 5-10 websites, and means my data usage has/should plummet (from a peak of 50MB/day) to more like 2-3MB/day.
My goal is to consistently use no more than ~100MB/month. Keep watching to see how the diet goes.
I may need to do something about www.hotukdeals.com - a useful site, especially if you are in the UK. But the rate of new items - most of them of no interest to me, tends to decrease the signal/noise ratio.
I havent implemented cookies yet - I might do, simply so that the tool could annotate where you have left off from a prior page download. However, its not clear if this is that useful - not unless you are refreshing at a rate of less than 1h - in which case it can be useful.
Posted at 22:57:33 by fox | Permalink
|Tales from the RSS....||Saturday, 15 February 2014|
Or is it?
My mobile SIM is data limited - 500MB/month. I am a tight person - I dont need minutes and texts, and the data is fine for odd browsing. I would never watch video or listen to music over the 3G airwaves.
But my vendor does something strange: for a zero cost per month, they shove adverts, piggy backed onto the web pages you look at. Before they did this in earnest, life was ok. I could get through 250MB of data a month - and managed to squeeze this down with the excellent off-road mode of the Opera browser, on Android.
Alas, they blocked that. So, my bandwidth was back to 250MB/mon. But the way they do adverts - very broken, typically is doubling that - more like 500MB/mon - because once a page has downloaded, they tack on some form of redirect, and take you to the advert page. To get back to the page you were browsing is another full copy of the page. (They increased the free data allowance from 400MB to 500MB, possibly an indication they knew what they were doing).
In addition, over recent months, they have been suffering capacity problems, so a lot of the time, I would never get to my page of interest - although a lot of data was being downloaded, consuming bandwidth.
So, lets revisit the appalling state of all browsers on Android today.
First: Opera - simply the best - but alas, looking under nourished. With off-road mode and data compression, along with *settings* (yes! Settings!...see below), the ability to disable images - wins on the bandwidth usage. Typical data savings are at the 75% mark. Brilliant. Except my provider has blocked the off road mode, or Opera has stopped supporting it.
Next: Chrome. Severely rubbish. I dont care that it comes from Google. It is undernourished and lacking *SETTINGS*. No ability to control the browser or adblock. The recently introduced compression feature achieves around 10% bandwidth saving. Pure rubbish. The UI for Chrome is utter junk. So many clicks to get to the bookmarks and select a new page.
"Next": an ok browser, but lack of settings, nothing to write home about.
"Firefox": potentially good - lots of features and settings, but an annoying interface. Far superior to chrome.
Dolphin: the "best", after Opera. I dont like its version of the speed-dial page, but it suffices and is easy to configure.
Ok - so a brief run down of a bunch of browsers. I have used others, but none of them appear good enough to spend a few minutes on my phone.
Back to RSS: most RSS readers are rubbish. There is no control and determining bandwidth usage is nearly impossible.
So I wrote my own - again. My first RSS app was for CRiSP - its still there today, for web crawling and news reading in the contents page. But I wanted to avoid CRiSP and have a simple customised app - one I can run on my internet facing 24x7 machine at home, and use this for collecting RSS data from my mobile. This means stripping out the garbage in many RSS feeds, and being as minimalistic as possible. Its controlled from a rss.cfg file stored with the source file (a Perl script). In the browser, you can navigate fairly well. And it means I can grab a ton of news, and fit it into a 200kb page/download. That means I can trim my download data significantly, and possibly get to 100MB/mon instead of 500+MB/mon. And the page loads instantaneously - no images or other html things to slow page loading.
You can try it here:
Theres no sign up - you get what I give you and have configured my end. Theres nothing special about this page - except the need to be ultra fast, ultra compact and avoid the issue caused by all browsers being designed to render at speed and provide zero support for data allowances.
The initial version took about 4h of effort to write. So far, the effort is around the 20h mark. It took about 3 minutes to make the RSS aggregator into a web server (about 10 lines of pure Perl). It deliberately doesnt do anything clever - other than restrict the files you can download. I may add cookies at a later date (just so that it can be cleverer about the deltas it sends you, but the 200k/page size limit means we dont have to be so clever - yet).
It could do with decent CSS support.
One thing I have yet to verify is the HTML. The web page is deliberately ill-formed HTML, which I think has the benefit that my provider cannot daisy chain adverts on to it. I will know in a few days if this is true or not, and will experiment to see how they do this. (Using a non-standard port helps; not being a properly HTML conformant page helps too).
Posted at 23:21:26 by fox | Permalink
|www.engadget.com - ouch||Saturday, 11 January 2014|
But they have changed it and broken it - badly. It hurts my laptop and hurts my mobile. There is too much data and too many images on the page and am having to resort heavily to adblocking it. On my laptop, the Vine adverts/videos are causing my CPUs to go into overdrive, and Firefox is severely broken - when its using 25% of 8 cpus (i.e. using 100% of two cpus) when the page is not loaded, or even after closing. Firefox is a very poor resource hog. But engadget has decided to not only put 3x the number of headlines on the front page, and replicated all the images in big and small sizes, and added videos, but the mobile site just goes to the desktop site.
They are basically being scornful of resource usage, no matter how they access it.
So, alas, goodbye engadget as my goto page.
People, please remember that there are lots of times that a high bandwidth connection is not available.
Whilst I am at it, I will give a thumbs up to "Taptu" - which I use on Android and iOS - which allows for offline reading.
Posted at 06:50:35 by fox | Permalink
|x86_x32 instruction set||Wednesday, 25 December 2013|
In general, I think support for this is a good thing. We mostly wont use it on our desktops, and probably not even on our Androids - the scope for confusion, missing libraries, and other stuff is huge.
For those that dont know - the Intel/AMD chips support classic i396 32-bit mode binaries and instructions, whilst x86_64 supports full 64-bit operation. i386 mode is important (or was important), in the transition to full 64-bit chips, since at the time, the OS, compilers, and apps didnt support 64-bit and not everything was recompiled "overnight".
Getting a new architecture ready is a lot of work - we see this today in the race for ARM-64. Very typically, what is needed is the OS to support the architecture, followed by the compilers. Actually, the compilers have to come first to support the OS, so there can be a standoff until the two stabilise.
Next comes the glibc library which provides the interface to the OS, and pretty much a bottom-up recompile of every library (X11, GTK, Gnome, KDE, QT, etc) and lastly the apps; initially the core OS/distribution apps, and then vendor apps.
This can take a while to stabilise across the full suite of apps - maybe 1-2y optimistically.
Given the maturity of 64-bit architectures, nobody is in a race to support x32 variant. The x32 variant is simply a 4GB address space versoin of x86_64. All the instructions and registers are available from the 64-bit architecture, but the address space is limited, which in turn reduces the sizes of pointers. Hence, smaller memory demands. In todays multigigabyte desktops, thats not a big issue. But this can lead to smaller binaries - smaller footprints, faster to load, less pages of memory, less TLB and less cache misses. The fact that more code can exist in the instruction cache is important and can give additional gains.
In general, the reported gains are small to minor - maybe a few percent, unless the app is "pointer heavy" - lots of pointers. Many apps have large datasets in memory (eg XML or text, or web pages); but sophisticated apps will have big complex structures with lots of pointers, so the savings may be good.
I tried recompiling CRiSP as a new linux-x86_x32 architecture to see what the real world effect is. (Side note: I updated my ptrace implementation to support x32 architecture - it required very few lines of code to do so; the kernel system call interface is nearly identical to x86_64, bar a quirk of how syscalls are encoded).
Recompiling (and defining) a new crisp variant took around 5-10 minutes of effort - although I had to put a workaround in for <sys/sysctl.h> which has a pragma complaining that the sysctl() system call is not supported for x32 architecture apps. (I dont think I use that, but had to conditionalise out the #include).
The results are interesting - x32 seems to win, compared to x86_64. The results suggest from 5-10% performance improvement. I attach two runs of my performance benchmark in CRiSP macros - what these macros do doesnt matter much, except to note each test attempts to take 5s, counting how many of certain operations can be performed. How this would translate into real world performance is not worth of a comparison. CRiSP is pretty efficient anyhow, and if you run CRiSP, your CPU is going to be mostly idle; and when it isnt, maybe you get upto 5% performance improvement - i.e. you wouldnt notice it. If we were optimising, for example, battery performance on a mobile, tablet or laptop, then having x32 could be beneficial (not just for CRiSP, but the OS and all the other apps you use). Imagine an extra 30-60mins of battery life on your portable device! Worth having, but may not happen.
Anyhow, heres the relevant benchmark data:
text data bss dec hex filename 1379459 50844 70212 1500515 16e563 bin.linux-x86_32/cr 1424517 93000 78024 1595541 185895 bin.linux-x86_64/cr 1349955 52064 70692 1472711 1678c7 bin.linux-x86_x32/cr
The above shows the sizes of the 'cr' executable - the x32 variant certainly wins. (Due to more registers and better instruction layout compared to the x86_32 variant).
Heres the CPU benchmarks. I didnt run the x86_32 benchmark, since most people will run pure 64-bit desktops and use the 64-bit binary. One final note: CRiSP compiled nicely, and identically compared to the two other linux architectures. *Except* for missing X11 libraries - i.e. I cannot build a "crisp" windowed GUI version of CRiSP (maybe I havent installed the relevant packages, so I will take a look after this blog post to see if it is viable).
PERF3: 25 December 2013 23:14 v11.0.22a -- linux-x86_x32 1) loop Time: 5.00 3,635,000/sec 2) macro_list Time: 5.01 58,200/sec 3) command_list Time: 5.00 4,185/sec 4) strcat Time: 5.00 77,000/sec 5) listcat Time: 5.01 240/sec 6) string_assign Time: 5.00 448,800/sec 7) get_nth Time: 4.99 8,890/sec 8) put_nth Time: 5.00 142,340/sec 9) if Time: 5.00 559,000/sec 10) trim Time: 5.00 143,800/sec 11) compress Time: 5.00 116,000/sec 12) loop_float Time: 5.00 2,950,000/sec 13) edit_file Time: 5.00 80,600/sec 14) edit_file2 Time: 5.00 2,592/sec 15) macro_call Time: 5.00 74,100/sec 16) gsub Time: 5.00 226,300/sec 17) sieve Time: 4.99 0/sec Total: 100.980000 Elapsed: 1:42
PERF3: 25 December 2013 23:20 v11.0.22a -- linux-x86_64 1) loop Time: 5.00 3,300,000/sec 2) macro_list Time: 5.00 44,450/sec 3) command_list Time: 5.00 3,360/sec 4) strcat Time: 5.00 72,500/sec 5) listcat Time: 5.02 212/sec 6) string_assign Time: 5.00 437,200/sec 7) get_nth Time: 5.00 8,435/sec 8) put_nth Time: 4.99 129,000/sec 9) if Time: 5.00 546,000/sec 10) trim Time: 5.00 129,300/sec 11) compress Time: 5.00 118,600/sec 12) loop_float Time: 5.00 2,790,000/sec 13) edit_file Time: 5.00 88,600/sec 14) edit_file2 Time: 5.01 2,340/sec 15) macro_call Time: 5.00 66,100/sec 16) gsub Time: 5.00 187,100/sec 17) sieve Time: 5.00 0/sec Total: 101.420000 Elapsed: 1:42
Posted at 23:22:19 by fox | Permalink
|ptrace now available||Tuesday, 24 December 2013|
ptrace (like strace) let you monitor a process or tree of processes and see the system calls. ptrace tries to combine the best bits of truss and strace, along with new functionality - very important functionality.
A particular target of interest is process-termination records. When a process terminates, there is a lot of valuable information available - but normally discarded. This includes:
- Process size - current and peak
- CPU time - user + system, along with child user + system
- Open files - file descriptors in use, and peak open files
- Parent process id
- Context switches
There is a lot of information available; existing Unix tools, like the shell "time" or /bin/time command make much of this avaliable, but usually focussed on the single command you run. Getting the tree of information from a command script (e.g. make), is not doable with out a lot of work.
I have used LD_PRELOAD in the past to collect this information, but this is fairly intrusive, and using a LD_PRELOAD sits inside the process. With ptrace, you can attach to a running process or launch the process, and with many options to control what to monitor and the amount of output, it is an invaluable tool in the toolkit. strace is a very good alternative, but ptrace purports to be better.
I hadnt touched my ptrace implementation in over 6 years - the code was very stale and lacking many system calls, missing the newer features of strace (do you know what the "-y" switch does? Neither did I till I look at strace in depth).
ptrace comes with good documentation describing the features, but the primary switch is "-exit". When this is used, the process termination records are written and you can use the Perl script (psummary.pl) to print out the most interesting processes, or see a timeline to show the order or process fork/exit's, in a tree like fashion, or modify the script to suit yourself.
I am putting this out as the first public release - I may release to github at a later date, but for now, this is a binary only release. (Delaying publication of the source simply because ptrace is not free of the CRiSP utility routines I use to wrap malloc and a few other functiosn); the source code makefile is ugly - it used to work on Solaris, but I have given that up for now, and concentrated on getting ptrace to work - to fulfil the goal of getting useful output.
Here is the link to ptrace. (It is available on my crisp.demon.co.uk site as well).
And heres some examples of the psummary.pl script:
/home/fox/src/trace@dixxy: psummary.pl Number of processes : 245 #1 Longest running: 3.322815 31025 /usr/bin/make #2 Longest running: 3.283731 31029 /bin/dash #3 Longest running: 3.271198 31033 /usr/bin/make #1 Peak RSS: 28184 31209 /home/fox/crisp/bin.linux-x86_64/cr #2 Peak RSS: 8988 31260 /home/fox/crisp/bin.linux-x86_64/hc #3 Peak RSS: 8988 31261 /home/fox/crisp/bin.linux-x86_64/hc Maximum file descriptor : 14 31209 /home/fox/crisp/bin.linux-x86_64/cr Maximum RSS : 4692 31209 /home/fox/crisp/bin.linux-x86_64/cr Max VmStk : 140 31033 /usr/bin/make Max VmData : 0 31025 /usr/bin/make Max voluntarty ctx : 6301 31209 /home/fox/crisp/bin.linux-x86_64/cr Max non-voluntarty ctx : 68 31209 /home/fox/crisp/bin.linux-x86_64/cr Max CPU time : 0.98 31025 /usr/bin/make Max cumulative CPU time : 1.1 31025 /usr/bin/make
/home/fox/src/trace@dixxy: psummary.pl -timeline | head -15 21:44:24.987056 start 31025 /usr/bin/make 21:44:25.009420 start 31026 /bin/rm 21:44:25.013167 end 31026 /bin/rm 21:44:25.014206 start 31027 /bin/rm 21:44:25.018092 end 31027 /bin/rm 21:44:25.019040 start 31028 /bin/rm 21:44:25.023004 end 31028 /bin/rm 21:44:25.025169 start 31029 /bin/dash 21:44:25.028427 start 31030 /bin/dash 21:44:25.028740 end 31030 /bin/dash 21:44:25.030161 start 31031 /bin/dash 21:44:25.030594 end 31031 /bin/dash 21:44:25.031529 start 31032 /usr/bin/basename 21:44:25.035048 end 31032 /usr/bin/basename 21:44:25.036802 start 31033 /usr/bin/make
/home/fox/src/trace@dixxy: psummary.pl -top 5 -l Number of processes : 245 #1 Longest running: 3.322815 31025 /usr/bin/make #2 Longest running: 3.283731 31029 /bin/dash #3 Longest running: 3.271198 31033 /usr/bin/make #4 Longest running: 2.871467 31077 /usr/bin/make #5 Longest running: 0.808828 31104 /bin/dash #1 Peak RSS: 28184 31209 /home/fox/crisp/bin.linux-x86_64/cr -batch -mregress #2 Peak RSS: 8988 31260 /home/fox/crisp/bin.linux-x86_64/hc -compress -m prim/prim.hpj #3 Peak RSS: 8988 31261 /home/fox/crisp/bin.linux-x86_64/hc -compress -m prog/prog.hpj #4 Peak RSS: 8988 31262 /home/fox/crisp/bin.linux-x86_64/hc -compress -m relnotes/relnotes.hpj #5 Peak RSS: 8988 31263 /home/fox/crisp/bin.linux-x86_64/hc -compress -m user/user.hpj Maximum file descriptor : 14 31209 /home/fox/crisp/bin.linux-x86_64/cr Maximum RSS : 4692 31209 /home/fox/crisp/bin.linux-x86_64/cr Max VmStk : 140 31033 /usr/bin/make Max VmData : 0 31025 /usr/bin/make Max voluntarty ctx : 6301 31209 /home/fox/crisp/bin.linux-x86_64/cr Max non-voluntarty ctx : 68 31209 /home/fox/crisp/bin.linux-x86_64/cr Max CPU time : 0.98 31025 /usr/bin/make Max cumulative CPU time : 1.1 31025 /usr/bin/make
The final piece of ptrace is to print out the peak cpu/memory information, to help guide on how much resources are needed to run a script (typically needed for a compile server or other shared compute resource).
If you are interested in the tool or the source, let me know (email) (CrispEditor -a.t- gmail.com). If there is enough interest, I will take it to the next level.
Future ideas for this could be to add dtrace like functionality, but I am trying not to mix the two systems, for now (e.g. the ability to run a script on entry/exit to system calls). ptrace contains facilities for emulating ltrace - but these arent debugged as yet (allowing tracing of arbitrary shared library entry points - you will see many libXXX.so files in the bin directory, but that is not a goal at present).
If people have other "I wish ..." ideas, please forward them to me. I may have the guts of such tools, or have a way to link ideas together.
Posted at 18:57:21 by fox | Permalink
|dtrace, ptrace, strace, xtrace, fatrace, ...||Saturday, 07 December 2013|
/home/fox/src/trace@dixxy: ftrace No command 'ftrace' found, did you mean: Command 'fstrace' from package 'openafs-client' (universe) Command 'mftrace' from package 'mftrace' (universe) Command 'dtrace' from package 'systemtap-sdt-dev' (universe) Command 'itrace' from package 'irpas' (multiverse) Command 'ltrace' from package 'ltrace' (main) Command 'fatrace' from package 'fatrace' (universe) Command 'strace' from package 'strace' (main) Command 'btrace' from package 'blktrace' (universe) Command 'rtrace' from package 'radiance' (universe) Command 'xtrace' from package 'xtrace' (universe) Command 'mtrace' from package 'libc-dev-bin' (main) ftrace: command not found
Ok - so there are a lot, what do they do? I havent tried all of them, but I tried a few:
xtrace - an X11 tracing utility.
fatrace - a "who is writing to the disk" utility. Very nice. It uses system calls (fanotify_init and fanotify_mark) to access this data - I wasnt aware of these syscalls, but now I am.
ltrace - is a tool to monitor shared library calls. dtrace can do this, but ltrace is nicer/easier for the simple scenarios. (my ptrace can do this too, but it needs more work to verify it is still functional).
Then theres dtrace (systemtap-sdt-dev) - from the man page, just seems to be a tool to define systemtap equivalents to DTrace static probes.
Posted at 23:31:24 by fox | Permalink
|Strace .. strace .. again||Thursday, 14 November 2013|
I omitted to put in the punch-line. How do you see what strace is doing?
$ strace strace -o /dev/null df
Is good. But this confused me:
$ strace -f strace df
I will let you ponder the "why".
Off to debug an issue with stat and time_t....
Posted at 22:22:02 by fox | Permalink
|strace strace ...||Wednesday, 13 November 2013|
Long before I ever started on dtrace, I used truss, on Solaris. Brilliant tool. (My hobby/interest has always been tools to help debug programs or systems). truss has some great features, but it was very beholden to Solaris.
I wrote my own truss - I called it ptrace; this was in the days before Linux was a successful and broad operating system. It did what I would call "hacky" things to enhance what truss did. And it worked on the various Unix flavors. By the time Linux became successful and prominent, strace appeared. Strace for many years was much reduced in functionality and reliability - on Linux - compared to truss or ptrace (my tool).
I had run out of ideas for ptrace, and I note my last change was back in 2007.
In 2010 or so I picked up the baton on dtrace.
strace has evolved and so has the Linux kernel; last I looked at the strace source, it was disappointing - reflecting some deficiencies in the Linux kernel (in terms of process control and debugging).
More recently I have had a chance to look at strace, and my ptrace, to reassess the state of the art in user space tracing. (Dtrace, as good as it is, is somewhat crude for doing what strace/ptrace and truss do - dtrace doesnt make it easy to create a standalone application that doesnt need full privs and has a good quality command line parser; dtrace comes with a truss emulation, but its not refined and its not good at decoding the arguments to syscalls...but I digress).
Looking at strace, it lacks a facility I was interested in, and to which ptrace has: stack dumps of the syscalls. I found a package on google called strace+ but it wont build, and I gave up trying to figure out what was wrong, such was the brokenness of the build.
So, I re-evaluated my ptrace. Last I had touched it, in 2007, was a while ago and it just didnt work - reliably. It didnt acknowledge 64 bit processors/processes or a mixed 64/32 bit world. And it didnt compile anymore.
After a couple of days, I got it up and running again; I started adding the best bits of strace and other enhancements, and now, its pretty good. Its reliable (thanks to bug fixes but also the kernel ptrace(2) enhancements make it much more resilient). On older Linux kernels, kill -9 of strace or ptrace could hang or kill the processes being traced. On modern Linux kernels, this horrific situation is resolved.
I have an arsenal of LD_PRELOAD bits - which are useful for debugging or monitoring specific scenarios, but strace/ptrace/dtrace are great for pure unobtrusive debugging.
I used the strace source to help fix/understand some of the issues in my own code. And am now considering adding more functionality - much more than truss/strace has. FYI, heres the help/usage for ptrace as it currently stands (some features are broken - I need to fix them - especially the i386 specific code).
I may or may not release ptrace as source - I dont necessarily have an interest in maintaining it - as its potentially fast paced for the situation at hand I am debugging. (ptrace gives me the luxury of writing C code rather than D code, in user space to do very specific things - similar to LD_PRELOAD, but in a way that can rarely accidentally kill the target; and ptrace is more portable than Dtrace to systems where you dont have root access to debug scenarios).
ptrace: Trace process execution. (C) 1990-2014 PD Fox, Foxtrot Systems Ltd Usage: trace [-delay nn] [-d nn] [-gethostid nnn] [-trap] [-gethostname name] [-llib ...] [-o output] [-fchnt] [-p pid] [-s size] [-v [!]syscall,...] [-r [!]fd,...] [-w [!]fd,...] [-size nn] [-stack:nn] [-regs] [command]
-a Print argv on execv calls. -flush Flush output as we go along. -func
List of functions to trace -gethostid Intercept gethostid() system call and fake return value. -gethostname Intercept gethostname() system call and fake return value. -hex Dump ASCII strings in 1-byte hex -hex2 Dump ASCII strings in 2-byte hex -hex4 Dump ASCII strings in 4-byte hex -name Sort by name. -nest Allow for nested functions -time Sort by syscall time. -tee file Write output to specified file and stdout. -trap Map SIGTRAP signal -trace Trace with -l switch -pc Show PC of system call
-c Display system call counts -delay nn Sleep for nn msec before each syscall -d nn Detach after nn calls to gethostid() -e Dump out exec() functions -f Follow child processes -h Display strings in hex/asc (read()/write()). -llib Preload shared library. -m
Intercept page faults. -multiline When printing certain arguments, use multilines to make pretty printing. -n Print network addresses numerically. -nosyscalls Don't print syscalls (monitor page faults only) -o file Write output to specified file. -p pid Trace specified process. -ptr Show pointers for arguments. -q Quiet mode -- dont print output. -r [!]fd,... Dump read buffers for specified file descriptors. -regs Dump registers. -s Trace list of signals. -size size Specify size of strings to print out. -stack:nn Dump call stack (depth of nn). -t Print timestamps. (msec accuracy) -tt Print timestamps. (usec accuracy) -v [!]syscall,... Specify syscalls to [ignore]/trace. -verbose Add extra detail for some args. -w [!]fd,... Dump write buffers for specified file descriptors. -warp YYYYMMDD-HH:MM:SS Warp clock system calls.
-nouse_process_vm Avoid Linux 3.4 dependency
Set PTRACE_OPTS to pass in command line arguments.
Posted at 22:56:53 by fox | Permalink
|Things I hate...||Thursday, 07 November 2013|
I normally dont mind filling in the odd survey - especially as I have actually used the site. But, as you hunt around the web, many times you shift away immediately from a site. Grabbing you on the entrance just makes you go away and *never* come back.
Just wanted to check news and traffic info on a site - and "Bang!" up pops the dreaded survey. No thanks - I needed the info quickly.
I filled in a BBC iPlayer survey the other day and specifically stated they could contact me to get "real" information, not the worthless rubbish for a questionaire that pops up, but nobody cares.
It really is a shame.
Or the sites which popup a "Do you need help" when you havent even had a chance to look.
Please, web designers - treat your customers with respect.
And as for the EU Cookie directive - that was as well thought out as a wet fish trying to get a suntan.
Oh well..back to pivot tables.
Posted at 22:56:38 by fox | Permalink