Tag Archives: linux

Docker in an LXC container on Gentoo

Docker is the newest craze in the devops world. It’s a tool that assists with application containerization using Linux Container technology. I decided to give it a try, but do it with a twist: I want to run docker inside a LXC container, essentially, run docker containers inside LXC containers. This inception style setup has a few benefits – It allows docker and its dependencies to be contained, isolated from the host machine. It also allows testing of different docker versions on different containers. It my case, I want to run docker under Ubuntu 14.04, without reformatting my entire Gentoo host.

Continue reading

Avahi, setrlimit NPROC and lxc

Over the weekend I installed Avahi (the open source bonjour equivalent) and bumped into a strange error while trying to restart the service. /var/log/message says chroot.c: fork() failed: Resource temporarily unavailable. Searching the interwebs revealed it is an issue with LXC and setrlimit.

The setrlimit call can limit set cetain limitations on processes. One such limitation is NPROC, the number of processes that can have the same UID. Using setrlimit NPROC can enhance security by preventing unexpected forking, like when an attacker is trying spawn a new process. However, the server I am running on uses LXC, and avahi is installed on the host. In LXC, the containers themselves have isolation between one another, but the host sees all processes. The PIDs of container processes are remapped but their UIDs stay the same. Thus, you will get UID collisions where user 102 of container can refer to say ntp, while 102 of host can refer to avahi. Because the host sees and accounts for all processes, setrlimit on avahi (102) of say 3 processes will also count existing processes in containers with UID 102 (such as ntp) and thus breach the limit and unable to spawn.

The only way to solve this is to edit avahi.conf and set rlimit-nproc or just disable rlimits altogether using the --no-rlimits switch.

I guess as LXC and control groups becomes more common, developers will need to adjust their assumptions about users and processes.

Why is syslog-ng taking up 100% of CPU inside a lxc container

While experimenting with LXC, the linux virtual container, which by the way is shaping up to be a viable replacement for openvz, I ran into an annoying issue of syslog-ng taking up 100% of CPU time inside the container. Stumped, I tried to add the -d flag to the syslog command line, but it did not yield any clues.

Armed with strace, and attaching to the rouge process, the following spat out of the console again and again.

gettimeofday({1287484365, 501293}, NULL) = 0
lseek(8, 0, SEEK_END)                   = -1 ESPIPE (Illegal seek)
write(8, "Oct 19 19:39:57 login[439"..., 105) = -1 EAGAIN (Resource temporarily unavailable)

The key lines were lseek and write, both trying to write to file descriptor 8. To find out what fd 8 was, all I had to do was ls -al /proc/7411/fd/8 – The culprit was /dev/tty12. Now having looked into syslog-ng.conf, I was reminded of the fact that By default messages are logged to tty12.... So it seems, tty12 is somehow denying access to syslog. Being in LXC, I decided to check out tty12 by doing lxc-console -n container -t 12. To my surprise, syslog-ng was instantly unclogged as log messages were released into console. It looked as if the tty12 buffer was clogged up.

Regardless of the reason, the easy fix is to stop syslog-ng logging to tty12 as I’m never going look at that far away console. Commenting the console_all lines, all was fixed. This would probably never have happened if I had used metalog :/

Qemu/KVM sometimes not registering Mouse Clicks when used over VNC

After setting up Qemu/KVM and VNC and fixing cursor positioning issues (with the -usbtablet option), I had an annoying issue of the VNC viewer (TightVNC in this case) sometimes missing mouse clicks. You would quickly click on a button and icon and nothing would happen. If you hold it for long enough, it will eventually register. I don’t want to be holding my button for a second to make sure every click regsiters though.

After fiddling around with the options, I finally found the culprit. The option inside the VNC viewer “Emulate 3-buttons (with 2-button click)” seems to be the cause. Turning it off seems to make my mouse clicks reliable. No idea why though.

iptsafe – iptables with dead man’s switch

When dealing with iptables remotely, you can easily set a firewall rule which would lock yourself out of machine. After that, the only way to unlock yourself is to physically go the machine and unset the firewall rules it through the terminal. If this was a VPS or dedicated server, chances are you can’t physically access the machine and have to contact the service provider to reset the firewall rules.

This is an instance where a dead man’s switch would help. The theory goes that if an operator is detected incapacitated, then an certain action will occur. In our case, the action is to undo our firewall changes. How does it know we are incapacitated? Well if we don’t report back in a certain amount of time, then we’re probably dead. A long time application of this is actually found when you try to change your monitor’s resolution, it ask you if you want to keep it. If it gets no response, it’ll revert back automatically.

Do you want to keep these settings

The iptsafe script works on the same principle. It is a wrapper around the iptables command. It takes the same command parameters as iptables, with the exception that if you only specify one parameter, then it’ll assume it is an iptables-save’d file and use it on iptables-restore. Once iptsafe is run, it’ll first use iptables-save to store a copy of the current iptables state, then apply the changes you requested. After that, it prompts you to keep the changes, and if you don’t respond within 15 seconds, it’ll revert back to the original state.

Here’s iptsafe

Sample usage:
# iptsafe -A INPUT -i eth0 -p tcp -s 192.168.0.1 -j ACCEPT
or
# iptsafe my-saved-iptables

Pondering per user accounting in Linux

I’ve been researching for the better part of the day on what the best method to account for bandwidth (and cpu/memory) used by a particular user is. This is useful if you run a hosting business and give out shell access. At first I was looking for a way to meter SSH. There seems to be an old patch for it, but as I continued reading, a old mailing from a mailing list pointed out that there are heaps of ways to generate traffic when you have a shell account (ie wget). In fact you don’t even need shell access – any scripting language that could download will consume bandwidth that may not be accounted for.

So this began my quest to find the best solution to per user accounting in linux. The basic concept is that since the bandwidth consumption is triggered by a process, and owned by a specific user, we should be able to trace traffic to a user and record as such. The advantage is even greater if you run peruser mpm apache or suexec’d php.

I began looking at netfilter/iptables, which had a match -m owner uid. This works only on the OUTPUT chain and will tell you who sent the packet, but unfortunately doesn’t tell you who a packet was destined for.

iptables has a connection tracking feature, that tracks active connections, allowing for stateful packet inspection. If you have the kernel feature enabled, it will also count the traffic numbers, which you can then view in /proc/net/ip_conntrack (or /proc/net/nf_conntrack for newer installations). Using that, and cross referencing it with the netstat -anp and process table will give you an idea of which user owns the connection. This is assuming of course that the process doesn’t setuid to change users.

But then, how are we going to collect all the data? Polling would be extremely slow and tedious and you might miss short lived connections. It seems that using libnetfilter_conntrack, you can subscribe to an event that notifies when connection states have changed (CONFIG_IP_NF_CONNTRACK_EVENTS). Using that, you can record when connections are opened and when they are closed as they happen.

What about processes? Processing accounting can be easily taken care of by the unix acct tools, which monitors processes as they are created and destroyed, provided you have the correct kernel options enabled. But what if you don’t have this option, ie on a VPS – Is there an alternative? The answer is yes, but ugly. You might remember that process information can be access via /proc. What if I set inotify, the file system change mechanism to tell me when /proc has changed? Somebody already thought of this and found it didn’t work quite as expected. The reason for this was mentioned in the linked thread, but the responders did give a good alternative – using ptrace ().

The ptrace command is a powerful unix system call that can manipulate processes it has attached to. It is what the debugger gdb uses to debug running applications. Using the ptrace function, you can set an option to notify the controlling process via SIGTRAP that the ptrace’d process has terminated, or forked/execed. Using this, you can potentially hook into every process and closely monitor their lifecycle. The downside is that you cannot have two ptrace active on the same process, which means application like gdb will fail if your monitoring system is active. Since ptrace is primarily used for debugging, it may also degrade performance of application it has been attached to. So the bottom line is that it looks like it is too extravagant and thus the wrong way to go for implementing a process accounting/monitoring system.

Looks like my quest to find a viable way of accounting per user accounting has so far eluded me. Perhaps the old ways of individual accounting in every application service – apache/ftp/imap/smtp is here to stay.