(gdb) break *0x972

Debugging, GNU± Linux and WebHosting and ... and ...

Simple strace debugging

Tuesday, August 30, 2016 - No comments

Today my URxvt terminals are stuck. All of them, at the same time.

I know they're just clients (urxvtc) for the main daemon (urxvtd) so I'm not surprised that all of them are stuck, but I don't know why.

URxvt standalone version (urxvt) works well, so I can start inverstigating (by the way, in my i3 environment I mainly use URxvt client (bound to Mod+Shift+Enter), because it's faster to start, but I have a backup binding Mod+Ctrl+Enter for the standalone version, just in case things like that happen :).

Then, with the right tool, it's easy to find the problem:

 $ strace -p $(pidof urxvtd)
 strace: Process 1518 attached
 wait4(20063, 
 ^C
strace: Process 1518 detached
 <detached ...>

So PID 20063 is the problem, urxvtd is waiting for it forever. But who is that?

$ ps aux | grep 20063 
kevin    20063  0.0  0.0  13628  2920 ?        S    10:45   0:00 sh -c echo -en toto\.ods | xsel -i -b -p

Alright, it's my clipboard management script that doesn't return ...

$ kill -9 20063

Bye bye clipboard, and hello again my terminals :-)

Understanding why Git Annex is busy with Strace

Sunday, December 21, 2014 - No comments

I'm getting into git annex for synchronizing my photos and videos between two computers and two backup disks, and sometimes commands take forever to complete. As they're not very verbose, it's try to understand what's going on:

terminal #1 $ git annex direct
commit

terminal #2 $ top
Tasks: 137 total,   2 running, 134 sleeping,   0 stopped,   1 zombie
%Cpu(s): 11.5 us,  3.8 sy,  0.0 ni, 34.7 id, 49.7 wa,  0.0 hi,  0.3 si,  0.0 st
GiB Mem :    3.830 total,    0.456 free,    0.818 used,    2.556 buff/cache
GiB Swap:    1.862 total,    1.832 free,    0.030 used.    2.711 avail Mem 

  PID USER      PR  NI    VIRT    RES  %CPU %MEM     TIME+ S COMMAND                                                                                                     
 9770 kevin     20   0 1046.6m  31.8m  21.2  0.8  66:17.35 S git

OK, our target pid is 9770 or $(pidof git) if there's only one.

$ sudo strace -p 9770
Process 9770 attached
read(4, "\32\331QJ....36\31#\327\321\361vr\246\326{y"..., 16384) = 16384
write(5, "\273\3203....\232\337\227\310\233b"..., 8192) = 8192
write(5, "F\231C\24...73<=7\34479\222\342\327\233:"..., 8192) = 8192
read(4, "M\223\36...\271\264\327\321l\260h&\36\226"..., 16384) = 16384
^C
Process 9770 detached
 <detached ...>

Git is reading from fd=4 and certainly writing it to fd=5. What are these files?

$ llh /proc/9770/fd
....
lr-x------ 1 kevin users 64 Dec 21 12:06 4 -> .../2013-03-10 Ski de rando au Grand Colon/100_0355.MP4
lrwx------ 1 kevin users 64 Dec 21 12:06 5 -> /media/sdb1/data/.git/objects/pack/tmp_pack_0vBSc4

Ok, that's why it takes time, it's copying the content of my files to git internal objects ... I'm not sure I wanted git annex to do that in fact ...

Subsidiary question: is it a good idea to let strace attached to my process, for instance to follow to progression of git annex, by tracking the files it opens?

(terminal 1) $ sudo strace -e open -p 9770
(terminal 2) $ sudo strace -e open -p $(pidof strace)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_TRAPPED, si_pid=20819, si_uid=1000, si_status=SIGTRAP, si_utime=62, si_stime=1355} ---
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == 133}], __WALL, NULL) = 20819
rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT PIPE TERM], NULL, 8) = 0
ptrace(PTRACE_GETREGSET, 20819, NT_PRSTATUS, [{0x66a480, 216}]) = 0
ptrace(PTRACE_SYSCALL, 20819, 0, SIG_0) = 0
# and again and again

Answer: No! strace receives a ptrace notification each time the traced application does a syscall, no matter what we asked to print (-e open). And each time, strace asks for the CPU registers (PTRACE_GETREGSET). So no, don't let strace attached to your application, it will slow it down a lot!

Solving administration problems with debugging tools (strace)

Sunday, November 16, 2014 - No comments

This week, we wanted to setup a printer on a colleague's computer, it worked on CUPS web interface (http://localhost:631), but Gnome control center was freezing when we tried to access the printer configuration.

gnome freeze

How can you get a clue about what's going on?

GDB might be a bit of an overkill, even if your distribution provides you with Gnome's source code and debug information.

But strace can be helpful!

 $ strace gnome-control-center
 execve("/usr/bin/gnome-control-center", ["gnome-control-center"], [/* 37 vars */]) = 0
 brk(0)                                  = 0x1ee9000
 access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
 fstat(3, {st_mode=S_IFREG|0644, st_size=264676, ...}) = 0
 mmap(NULL, 264676, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f7486506000
 [...]
 <click on 'Printers'>
 [...]

 connect(15, {sa_family=AF_INET, sin_port=htons(631), sin_addr=inet_addr("192.168.1.25")}, 16) = -1 EINPROGRESS (Operation now in progress)
 fcntl(15, F_SETFL, O_RDWR)              = 0
 poll([{fd=15, events=POLLIN|POLLOUT}], 1, 250) = 0 (Timeout)
 poll([{fd=15, events=POLLIN|POLLOUT}], 1, 250) = 0 (Timeout)

Here it is, Gnome tries to connect to a network address, and the data polls are timing out. In fact, the colleague had configured it system to connect to the company CUPS server, which was not reachable from our lab, and Gnome tries again and again to connect to this address, unsuccessfully.

To go one step further, and find where Gnome picks this address, you can check what files the program opens:

$ strace -e open,connect gnome-control-center
[...]
open("/home/kevin/.cups/client.conf", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/etc/cups/client.conf", O_RDONLY) = 15
open("/home/kevin/.cups/client.conf", O_RDONLY) = -1 ENOENT (No such file or directory)
connect(15, {sa_family=AF_INET, sin_port=htons(631), sin_addr=inet_addr("192.168.1.25")}, 16) = -1 EINPROGRESS (Operation now in progress)

Bonne pioche, /etc/cups/client.conf is opened right before the connect call, easy peasy! (but it's not always that simple ;-)

$ cat /etc/cups/client.conf 
# see 'man client.conf'
#ServerName /run/cups/cups.sock #  alternative: ServerName hostname-or-ip-address[:port] of a remote server
ServerName 192.168.1.25

(I knew it, I just changed it 5 mins ago to recreate the problem!)


Different problem, same solution, I use open2300 to access the data of my weather station. I usually access it from the raspberry pi that I setup last year, but today I need to connect it to my desktop computer ... and it doesn't work:

$ ./interval2300 0 0
Unable to open serial device /dev/ttyUSB1

indeed, the weather station is on ttyUSB0, not ttyUSB1. Quick and dirty solution is cd /dev; sudo ln -s ttyUSB0 ttyUSB1, but that disappear on reboot (and I asked myself not to create a udev rule for that!). So, I had to understand where open2300 takes that file name: strace, there you go!

$ strace -e open ./interval2300 0 0                                                                                                                                  1 ?
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("open2300.conf", O_RDONLY)         = -1 ENOENT (No such file or directory)
open("/usr/local/etc/open2300.conf", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/etc/open2300.conf", O_RDONLY)    = 3
open("/dev/ttyUSB1", O_RDWR|O_NONBLOCK) = -1 ENOENT (No such file or directory)

Unable to open serial device /dev/ttyUSB1
+++ exited with 1 +++

Tree calls for dynamic libraries, and three for configuration file (current directory, /usr/local/etc, both missing, and finally /etc/open2300.conf is found). Thanks again strace!