(gdb) break *0x972

Extending Linux Perf Stat with LD_PRELOAD

Friday, March 25, 2016 - No comments

For my current work on monitoring, I need to use Linux perf stat. Perf tools read and dislay the harware counters, either for the wall execution of a process, or while attaced to it:

perf stat --pid $(pidof firefox)
^C
 Performance counter stats for process id '4257':

     13.860180      task-clock (msec)
            79      context-switches           
            16      cpu-migrations
            11      page-faults             
    18,397,934      cycles
    13,964,242      stalled-cycles-frontend 
     9,787,703      stalled-cycles-backend
     8,320,570      instructions
     1,743,632      branches
        93,080      branch-misses

   1.942768382 seconds time elapsed

That's great, I can attach perf to my process, run it for a while and stop it. But if I want to start, and stop, and start again, I can't. And that's what I want to do, from inside gdb.py.

Signals would be great for that, when ever I send a signal, perf dumps the counter values to stderr, and continues. But that's not implemented ...

Studying `perf-stat` source code

So let's see in perf-stat source code what we can find for that.

static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg)
static void print_aggr(char *prefix)

these look like good candidates, but their symbols are not exported ...

(gdb) print abs_printout
No symbol "abs_printout" in current context.
(gdb) print print_aggr
No symbol "print_aggr" in current context.

Next candidate?

static void process_interval(void)
(gdb) p process_interval
$3 = {void (void)} 0x441660 <process_interval>

Oh, great, and with no arguments, that's even easier ! Let's try it:

(gdb) call process_interval()
  3858.     571793.500217      task-clock
  3858          1,880,243      context-switches
  3858            115,610      cpu-migrations           
  3858          8,639,477      page-faults              
  3858  1,430,093,310,944      cycles
  3858  1,037,644,029,921      stalled-cycles-frontend
  3858    756,612,594,751      stalled-cycles-backend 
  3858    820,483,606,606      instructions
  3858    176,772,490,245      branches
  3858      5,454,270,151      branch-misses

Exactly what we were looking for !

Triggering `process_interval()` with a signal

Next, we need to be able to trigger this function remotely, and without modifying perf-stat source code. (The easy option would be to patch perf-stat, but then our tool would be harder to distribute).

That can be done with the help of Linux LD_PRELOAD trick: we preload a bit of code inside perf-stat address space, and during the application initialization, we register the signal handler:

void (*process_internal)(void) = (void *) PROCESS_INTERNAL;

void my_handler(int signum) {
    if (signum != SIGUSR2) {
       return;
    }

    printf("Received SIGUSR2!\n");
    process_internal();
}

void init(void) __attribute__((constructor));

void init(void){
  printf("Received init!\n");
  signal(SIGUSR2, my_handler);
}

Here the (GCC specific) function attribute constructor tells the linker to execute the function when it's loaded. In this constructor, we just register the signal handler, and in the signal handler we call process_internal().

But how do we get the address of process_internal? that's where it is not really really clean: it get it from a lookup of the binary's symbol addresses:

nm -a /usr/bin/perf | grep process_interval
0000000000441660 t process_interval

and we inject it manually in the code:

#define PROCESS_INTERNAL 0x0000000000441660
void (*process_internal)(void) = (void *) PROCESS_INTERNAL;

A better way would be to pass it as an environment variable:

#define PROCESS_INTERNAL ascii_to_long($PROCESS_INTERNAL_ADDR)

and in the shell:

PROCESS_INTERNAL_ADDR=0x$(nm -a /usr/bin/perf | grep process_interval | cut -d" " -f1)

and that works pretty well !

LD_PRELOAD=/path/to/preload.so perf stat -x, -p $PID_TO_PROFILE

kill -USR2 $(pidof perf)

Note that SIGUSR1 does't work, certainly because it's used by perf-stat. And don't forget the -x, argument if you want to parse perf-stat output.

Also, keep in mind that this is not a robust implementation (of course :), as there might be concurrency problems, for instance if the signal is received while perf is updating the counters.

If it ain't broken don't fix it; I'll wait for problems before going any further!

Quiting ZSH not too quickly

Friday, December 18, 2015 - No comments

This posts is for zsh shell only.

If you often use command-line tools such as GDB, you certainly know the hotkey ^d (EOF) to quickly quit the CLI. But sometimes, that's too sensitive! If you hit it twice in GDB, you do quit GDB, but also its parent shell!

set -o ignoreeof  # 10*^d exits zsh

Okay, we're a bit better now, we won't quit zsh by mistake ... but we cannot close it rapidely on purpose either. So let's improve it: 3 is a better threshold (and zle is zsh line editor):

set -o ignoreeof  # 10*^d exits zsh
function zle_quit () {exit}
zle -N zle-quit zle_quit
bindkey "^d^d^d" zle-quit

We simply bind the key sequence ^d^d^d to the quit function! (you have to do it quickly enough, otherwise it won't work)

For emacs fan, this works as well:

bindkey "^x^d" zle-quit

LD_PRELOAD interpolation and variadic functions

Tuesday, November 17, 2015 - No comments

As part of my work on OpenMP debugging, I had to implement interpolation functions to capture the beginning and end of some library functions. In general, that's easy:

// library function we want to intercept:
// int test(int a, int b);

int test(int a, int b) {
   static fct_t real_test = dlsym(RTLD_NEXT, "test");

   //before test
   real_test(a, b);
   //after test

}

I didn't test this code, but that should work. Now, what it the function you want to intercept looks like that:

void myprintf_real(const char *fmt, ...);

Portable Answer

you can't intercept it!

ASM Hardcore Answer

//tested and certainly only working on x86-64
void myprintf_asm_interpo(const char *fmt, ...) {
   //before myprintf

  // unroll the frame
  asm volatile("mov    -0xb8(%rbp),%rdi\n\t"
               "mov    -0xa8(%rbp),%rsi\n\t"
               "mov    -0xa0(%rbp),%rdx\n\t"
               "mov    -0x98(%rbp),%rcx\n\t"
               "mov    -0x90(%rbp),%r8\n\t"
               "mov    -0x88(%rbp),%r9\n\t"
               "add     $0xc0,%rsp\n\t");

  //jump to myprintf_real;

  volatile register void ** rsp asm ("rsp");

  //movq    $[addrs], -8(%%rsp) // I can't compile that ...

  *(rsp-1) = myprintf_real;

  asm volatile(
    "mov    %rbp,%rsp\n\t"
    "pop     %rbp\n\t"
    "mov     (%rax), %rax\n\t"
    "jmpq    *%rax");
}

I disassembled the function prolog with GDB, then reversed it and put it before jumping to myprintf_real. With that technique I can't insert code after the function execution (because it returns directly to the caller frame), but that was already a good start.

GCC-Specific Clean Answer

With __builtin_apply_args() and __builtin_apply()! Easy peasy! I prefer my code to be gcc-specific than architecture specific, especially with such a hardcoded blob of assembly!

int myprintf_gcc_interpo(char *fmt, ...) {
    // before 
    void *arg = __builtin_apply_args();
    void *ret = __builtin_apply((void*)printf, arg, 100);
    //after
    __builtin_return(ret);
}

From stackoverflow with love ;-) Test it with this file.

Install Archlinux Package without Internet Connection

Tuesday, October 27, 2015 - No comments

If for some reasons you have an Archlinux box without Internet access (like a Qemu system not completely setup?), but still want to install pacman packages, here is a little help:

Synchronize `pacman` database

cat > ./pacman_update.sh2 << EOF
#! /bin/sh

MIRROR=http://mirror.archlinuxarm.org/aarch64/

ROOT_FS=/path/to/archlinux/filesystem/root
PAC_SYNC=$ROOT_FS/var/lib/pacman/sync

set -x
wget -q $MIRROR/alarm/alarm.db -O $PAC_SYNC/alarm.db
wget -q $MIRROR/aur/aur.db -O $PAC_SYNC/aur.db
wget -q $MIRROR/community/community.db -O $PAC_SYNC/community.db
wget -q $MIRROR/core/core.db -O $PAC_SYNC/core.db
wget -q $MIRROR/extra/extra.db -O $PAC_SYNC/extra.db

Retrieve the packages to install

(archlinux) pacman -S zsh  
resolving dependencies...
looking for conflicting packages...

Packages (1) zsh-5.1.1-2.1

Total Download Size:   1.68 MiB  # nooo, we can't download that .... :(
Total Installed Size:  5.03 MiB

:: Proceed with installation? [Y/n] y
:: Retrieving packages ...
error: failed retrieving file 'zsh-5.1.1-2.1-aarch64.pkg.tar.xz' from mirror.archlinuxarm.org : Could not resolve host: mirror.archlinuxarm.org
warning: failed to retrieve some files
error: failed to commit transaction (download library error)
Errors occurred, no packages were upgraded.

yep, that's a good start, but that's not very convenient ...

(archlinux) pacman -Sp zsh
http://mirror.archlinuxarm.org/aarch64/extra/zsh-5.1.1-2.1-aarch64.pkg.tar.xz

yes, that's better !

Download the packages

cat > ./pacman_download.sh << EOF
#! /bin/sh

# run ./pacman_download.sh then past `pacman -Sp` urls to stdin

MIRROR=http://mirror.archlinuxarm.org/aarch64/

ROOT_FS=/home/kevin/travail/sample/juno-qemu/linaro/juno-fs
PAC_CACHE=$ROOT_FS/var/cache/pacman/pkg

while read url
do
    if [ -z "$url" ]
    then
        break
    fi
    echo Downloading $url into $PAC_CACHE ...
    wget -nc -q $url -P $PAC_CACHE
    echo Done
done
echo Bye bye.

Install the packages

(archlinux) pacman -S zsh
resolving dependencies...
looking for conflicting packages...

Packages (1) zsh-5.1.1-2.1

Total Installed Size:  5.03 MiB

:: Proceed with installation? [Y/n] 
(1/1) checking keys in keyring                     [######################] 100%
(1/1) checking package integrity                   [######################] 100%
(1/1) loading package files                        [######################] 100%
(1/1) checking for file conflicts                  [######################] 100%
(1/1) checking available disk space                [######################] 100%
(1/1) installing zsh                               [######################] 100%

And zsh is ready :-)

HTML Trick: Second Try with Element Inspector

Tuesday, December 09, 2014 - No comments

Another example after the trick to reveal 'hidden' passwords:

In Flickr, when the author put a right restriction, Flicker disables the ability to right-click and download the image (try it here, *View background image): you end up with a 1-px blank image.

So fire up the Element Inspector, that should highlight that "protection" zone:

And just delete this tiny div! (actually, the link to the image <img src=... /> is right below).

Now View image is available again :-)

By the way, in up-to-date versions of Firefox, you can bypass right-click protection by pressing the shift key at the same time, like in this website.

Printing corrupted (scanned) PDF

Thursday, November 20, 2014 - No comments

My printer is ... special. From its web interface, you can scan a document and get a PDF, but you can't print it!

It generated "nature-friendly" PDFs! Only white pages get out of the printer.

THINK BEFORE YOU PRINT: Please consider the environment before printing this email.

It doesn't work with Evince, nor pdf2ps, nor evince > print to file > print, nor convert.

But Evince does print some useful information:

Syntax Error (5404808): Illegal character '>'
Corrupt JPEG data: premature end of data segment
Corrupt JPEG data: premature end of data segment
Corrupt JPEG data: premature end of data segment

The JPEG images contained in the PDF are corrupted. For some reasons, Evince can display them onscreen, but not translate them to PS for the printer ... There's certainly a PDF library down there that doesn't handle invalid images that is used to transform PDFs into other formats.

Hopefully, Popplet's pdfimages doesn't rely on that "broken" library, and it can extract all the images of a PDF!

When you try to export the images in JPEG format (option -j), it still doesn't work, as it just extracts the invalid images out of the PDF. Eye of Gnome can't display it and explains why:

Error interpreting JPEG image file (Maximum supported image dimension is 65500 pixels)

However, pdfimages can also export PPM images ( portable pixmap file format), that are not invalid! yeay! :-)

pdfimages $PDF $PREFIX

and with ImageMagicks convert, you can rebuild your PDF:

convert-pdf() {
  PDF=$1
  PREFIX=convert-
  TMP=$(mktemp -d)
  WD=$(pwd)
 cp $PDF $TMP
  mv $PDF $PDF.bak
  cd $TMP
  pdfimages $PDF $PREFIX
  # convert ppm to jpg, that saves a lot space!
  for i in $PREFIX*.ppm
      convert $i $(basename $i .ppm).jpg 
  convert $PREFIX*.jpg $PDF
  mv $PDF $WD
  cd  $WD
  rm -rf $TMP
}

HTML Trick: Show Hidden Password

Tuesday, November 04, 2014 - No comments

Have you ever been stuck with password that you web browser knows, but you can't remember it? like that, it's here, but behind the dots ...

The normal way to proceed (in Firefox) in to go to [Preferences|about:preferences] > [Security|about:preferences#security] > Saved passwords, and lookup your password.

The "hacker" way is quicker, but requires a bit of HTML knowledge (but not that much ^^), as well as new web-dev tools, like Firefox's Element Inspector. Right click on your password field, inspect element:

Now you see the source of your web-page. Locate the <input type='password'/> element, and ... delete/edit the field type='password'. That's it ! It's not a password field anymore, so the web-browser doesn't hide it!

Easy peazy :-)

If you're bored with a moving image or text, or an advertisment, you can do the same to get rid of it, that's quite efficient! Just pay attention to what you delete, it's easy to remove the entire page if you delete a top-level element ;-)

Working Hard

Friday, October 31, 2014 - No comments

I also actually do work hard, sometimes, as when I took that screenshot of my work environment (it was the second year of my PhD program):

i3 window manager tilling the dual-screen
Firefox browsing GDB website, certainly looking for its documentation
Eclipse IDE opened on first investigations of what would become my PhD contribution
GDB running, trying to capture the behavior of Unix sockets,
GDB being compiled, as I was also working on its source code
(I could have a GDB instance debugging the other GDB, but that would have been to much :-)
Emacs editor opened on my literature review LaTeX document
Linux kernel configuration panel, as I was building a ARM version to play with ST's Linux kernel debugger
Unix top command, to make sure that my system is not overloaded by all of the above!

(maybe the screenshot was a bit staged, in particular the web browser, whereas I was actually looking a way to ... put a subject in a mailto:// link, but not that much ;-)

Conditional Compiling in Latex based on Filename

Wednesday, October 29, 2014 - 1 comment

If you want to generate two versions of a document, for example one for online reading with hyperlinks and another one for printing, with footnotes instead of hyperlinks [1], you can use this trick:

\usepackage{substr}

\newif\ifPaper
% that inside a package, where I can pass "paper" as an option
\DeclareOption{paper}{\global\Papertrue} 

% if substring "paper" is found in the job (file) name, set the flag
\IfSubStringInString{\detokenize{paper}}{\jobname}{\Papertrue}{}

then you create a link command accordingly:

\newcommand{\link}[2]{
\ifPaper#1\footnote{\url{#2}}\else\href{#2}{#1}\fi
}

and finally you create a link (symbolic or hard, it doesn't matter) and compile one version or the other:

ln -s myfile.tex myfile-paper.tex
pdflatex myfile
pdflatex myfile-paper

(I guess that some LaTeX packages already do that for free, but conditional compilation can have many other usages.)

1: http://blog.0x972.info/?d=2014/10/29/09/55/15-conditional-compiling-in-latex-based-on-filename

Add notification support to Owncloud Calendar with Selfloss (RSS aggregator)

Saturday, October 18, 2014 - No comments

In my different steps toward self-hosting, I switched from Google Calendar to Owncloud Calendar. However, one neat feature is missing in Owncloud, it's the ability to set task reminders. My schedule is not very busy, so I mainly put 'far-off' meetings and appointments ... and I forget to check the calendar again and miss the event!

One thing I check daily is my mail client, the other is my RSS feed aggregator, selfoss. And

selfoss

appears to be easily expendable, so the idea grew quickly in my mind, and today it's ready: I need to export the calendar in ICS format, parse it, and feed it to selfoss. I first thought about converting it to RSS, but I didn't want my events to be available online, so it was easier and quicker to jump directly from Owncloud to Selfoss.

Owncloud is open-source, so finding how to export the calendar was just a matter of code study, hopefully not done by myself this time:

define('OCROOT', '$PATH_TO_OWNCLOUD/owncloud/');

function owncloud_get_calendar($username, $cal_id) {
  //it's not necessary to load all apps
  $RUNTIME_NOAPPS = true;
  require_once(OCROOT . '/lib/base.php');
  require_once(OCROOT . '/apps/calendar/appinfo/app.php');
  
  //set userid
  OC_User::setUserId($username);
  
  OCP\User::checkLoggedIn();
  OCP\App::checkAppEnabled('calendar');
  
  $calendar = OC_Calendar_App::getCalendar($cal_id, true, true);
  if(!$calendar) {
    return;
  }
  
  return OC_Calendar_Export::export($cal_id, OC_Calendar_Export::CALENDAR);
}

The

cal_id

parameter is show in Owncloud calendar when you try to download the ICS file:

ICS is a well-define format, same luck, it wasn't hard to find a simple parser: ics-parser.

Last step, and not least one, I had to feed to parsed ICS to Selfoss, through a "spout", a source plugin able to a provide new items to Selfoss.

first, we tell selfoss what information we need to add a new calendar feed: the URL is mandatory, username and password are optional, "days in advance" tells for how many days in advance we want to fetch the events.

public $params = array(
        "url" => array(
            "title"      => "URL",
            "type"       => "text",
            "default"    => "",
             "required"   => true,
            "validation" => array("notempty")
       ),
        "username" => array(
            "title"      => "Username",
            "type"       => "text",
            "default"    => "",
            "required"   => false,
            "validation" => ""
       ),
        "password" => array(
            "title"      => "Password",
            "type"       => "password",
            "default"    => "",
            "required"   => false,
            "validation" => ""
       ),
        "days" => array(
            "title"      => "Days in advance",
            "type"       => "text",
            "default"    => "-1",
            "required"   => false,
            "validation" => "int"
       )
   );

then we load the calendar, either from Owncloud or directly from its URL. If URL is "owncloud_", we load from owncloud, otherwise, we fetch the URL. Spouts implement the iterator interface, so here we prepare iterator, "rewind" it, and it's ready.

    public function load($params) {
      $link = $params['url'];
      if (strpos($link, "owncloud_") === 0) { // owncloud_<cal_id>
        $calendar_lines = owncloud_get_calendar($params['username'],
                                                substr($link, 1+strpos($link, "_")));

        $ical = new ICal(null, explode("\n", $calendar_lines));
      } else {
        if (!empty($params['password']) && !empty($params['username'])) {
          $auth = $params['username'].":".$params['password']."@";
          $link = str_replace("://", "://$auth", $link);
        }

        $ical = new ICal($link);
      }
      
      $this->items = $ical->events();
      
      $this->days = $params['days'];
      $this->rewind();

      $this->params = $params;
    }

next this it to select the which events we want to return to Selfoss, in the
```
next
```
other of the iterator. We count the distance in days, and only print it if it's below the threshold. Obviously, we also discard the past events.

    public function next() {
      if ($this->items == false) {
        return false;
      }
      while (1) {
        $this->position++;

        $event = $this->current_event();
        if (!$event) {
          return false;
        }
        
        $event_date = strtotime($event["DTSTART"]);
        $daydiff = floor(($event_date - time()) / 60 / 60 / 24); // in days
          
        if (isset($event["RRULE"])) { // repeating event
          // explained at the step

        } else { // normal event
          if ($event_date < time()) { // not in the past
            continue;
          }
        }
        if ($this->days !== -1 && $daydiff > $this->days) { // not more than $days of distance
          continue;
        }
        $this->items[$this->position]["DDIST"] = $daydiff;

        return $this->current();
      }
    }

one kind of events were missing after the first try, it's the repeating events: "every Saturdays, starting the 18th of october". So far I only implemented weekly events, I'll add other kinds whenever I'll need them!

       if (isset($event["RRULE"])) { // repeating event
          if ($event["RRULE"]["FREQ"] === "WEEKLY") {
            if ($event["RRULE"]["INTERVAL"] !== "1") {
              //ignore for now
              continue;
            }
            $DAYS_OF_WEEK = array("SU" => 0, "MO" => 1, "TU" => 2, "WE" => 3, "TH" => 4, "FR" => 5, "SA" => 6);
            $daydiff = $DAYS_OF_WEEK[$event["RRULE"]["BYDAY"]] - date("w");
            if ($daydiff < 0) $daydiff += 7;
            
          } else {
            // ignore for now
            continue;
          }
     }

the last step was providing Selfoss with the events/items:

    public function getTitle() {
      if ($this->items == false || !$this->valid()) {
        return false;
      }
      
      $event = $this->current_event();
      if (isset($event["RRULE"])) {
        $dispdate = repeating_time($event["RRULE"]) . " " . end_time($event["DTSTART"]);
      } else {
        $dispdate = start_time($event["DTSTART"]);
      }
      $dispdate .= " -> " . end_time($event["DTEND"]);
      
      $text = stripslashes(htmlentities($event["SUMMARY"]));

      return "$dispdate | $text";
    }

    public function getContent() {
        if ($this->items == false || !$this->valid()) {
          return false;
        }
        
        $event = $this->current_event();
            
        $text = stripslashes(htmlentities($event["SUMMARY"]));
        
        $description = "";
        if (isset($event["DESCRIPTION"])) {
          $description = $event["DESCRIPTION"];
        }
        if (isset($event["LOCATION"])) {
          $description .= "\nLocation: ".htmlentities($event["LOCATION"]);
        }

        if ($event["DDIST"] === 0) {
	  $description .= "\nAujourd'hui";
	} else {
          $description .= "\nDans ".$event["DDIST"]. " jour";
          
          if ($disttime !== 1) {
            $description .= "s";
          }
        }

        $description = str_replace("<br>", "\n", htmlentities($description));
        
        return $description;
    }

    public function getId() {
      if ($this->items == false || !$this->valid()) {
        return false;
      }
       
      $id = $this->current_event()["UID"];
      $id .= date("Y-m-d"); // refresh the event every day

      if (strlen($id) > 255) {
        $id = md5($id);
      }
      return $id;
    }

You can see in

getId

that I concatenate the date of the day to the event ID. That means that every day, Selfoss will "think" that the event is new, and mark it as unread ... and there we are !

Actually, there is one more step I had to implement: my Selfoss feeds are public, I don't mind sharing what I read, but I don't want my calendars to be publicly available. So I hacked Selfoss to "hide" some items if the session is not authenticated. Easiest way: hide tags starting with a "@". Nothing very complicated, we just remove the unwanted tags from the lists !

- return $this->backend->get($options);
+ $items = $this->backend->get($options);
+
+ if(!\F3::get('auth')->showHiddenTags()) {
+ foreach($items as $idx => $item) {
+ if (strpos($item['tags'], "@") !== false) {
+ unset($items[$idx]);
+ }
+ }
+ $items = array_values($items);
+ }
+
+ return $items;

(and the same for sources, tags and items)

And there we are, daily notification of Owncloud calendar events, directly in my feed reader :-)

Older entries »