sudo hangs and leaves the executed program as “zombie”

November 1, 2010

Today I discovered a non-security bug in sudo – the executed program finishes successfully, but sudo hangs forever waiting for something. The executed program is left in a “zombie” process state. Here is how the process list looks like, for example:

root 6368 0.0 0.0 2808 1592 pts/6 Ss 18:39 0:00 | \_ -bash
root 1103 0.0 0.0 2200 1000 pts/6 S+ 21:45 0:00 | \_ ./sudo -u root sleep 5
root 1104 0.0 0.0 0 0 pts/6 Z+ 21:45 0:00 | \_ [sleep] <defunct>

If we try to trace the system calls of the sudo command, here is what we get:

[root@tester2 ~]# strace -fF -p 1103
select(4, [3], [], NULL, NULL

The sudo process waits endlessly in a select() system call which waits for file descriptor #3. So we quickly check what corresponds to file descriptor #3:

[root@tester2 ~]# ls -la /proc/1103/fd
total 0
dr-x—— 2 root root 0 Nov 1 21:45 .
dr-xr-xr-x 5 root root 0 Nov 1 21:45 ..
lrwx—— 1 root root 64 Nov 1 21:45 0 -> /dev/pts/6
lrwx—— 1 root root 64 Nov 1 21:46 1 -> /dev/pts/6
lrwx—— 1 root root 64 Nov 1 21:45 2 -> /dev/pts/6
lrwx—— 1 root root 64 Nov 1 21:46 3 -> socket:[1136261781]

Socket “socket:[1136261781]” is already closed tough (blinking in red on my console). Thus sudo is waiting in a blocked select() for a change on file descriptor #3, which is already closed and will never change its state. Sudo will therefore wait forever.

In order to understand why this happens, we will look at the source code of sudo. Here is snippet from “exec.c” – only the relevant code is left for clarity:

int
sudo_execve(path, argv, envp, uid, cstat, dowait, bgmode)
{
    int sv[2];

    if (socketpair(PF_UNIX, SOCK_DGRAM, 0, sv) != 0)
    error(1, "cannot create sockets");

    zero_bytes(&sa, sizeof(sa));
    sigemptyset(&sa.sa_mask);

    /* Note: HP-UX select() will not be interrupted if SA_RESTART set */
    sa.sa_flags = SA_INTERRUPT; /* do not restart syscalls */
    sa.sa_handler = handler;
    sigaction(SIGCHLD, &sa, NULL);
    sigaction(SIGHUP, &sa, NULL);
    sigaction(SIGINT, &sa, NULL);
    sigaction(SIGPIPE, &sa, NULL);
    sigaction(SIGQUIT, &sa, NULL);
    sigaction(SIGTERM, &sa, NULL);

    /* Max fd we will be selecting on. */
    maxfd = sv[0];

    child = fork()
    close(sv[1]);

    fdsr = (fd_set *)emalloc2(howmany(maxfd + 1, NFDBITS), sizeof(fd_mask));
    fdsw = (fd_set *)emalloc2(howmany(maxfd + 1, NFDBITS), sizeof(fd_mask));

    for (;;) {

    zero_bytes(fdsw, howmany(maxfd + 1, NFDBITS) * sizeof(fd_mask));
    zero_bytes(fdsr, howmany(maxfd + 1, NFDBITS) * sizeof(fd_mask));

    FD_SET(sv[0], fdsr);

    if (recvsig[SIGCHLD])
        continue;
    nready = select(maxfd + 1, fdsr, fdsw, NULL, NULL);
    if (nready == -1) {
        if (errno == EINTR)
        continue;
        error(1, "select failed");
    }

    }
}

/*
 * Generic handler for signals passed from parent -> child.
 * The recvsig[] array is checked in the main event loop.
 */
void
handler(s)
    int s;
{
    recvsig[s] = TRUE;
}

The race-condition happens right before the select() on line #40, and just after the “if” on lines #38 and #39. If the parent process gets re-scheduled after the “if” was executed, and at this very time the child process finishes and SIGCHLD is sent to the parent process, sudo gets in trouble. The SIGCHLD handler accounts in the variable “recvsig[]” that the signal was received, and then the parent process calls select(). This select will never be interrupted, as the author had it in mind. In 99% of the cases, the parent process will enter in the select() blocking state before the child process ended. The child would then send SIGCHLD, which will be accounted in the handler procedure, and will also interrupt select() which will return -1 in “nready”, and “errno” will be set to EINTR.

Here is an easy way to reproduce the bug. We add a sleep() of 10 seconds between the “if” and select(), thus simulating that the system was very busy and re-scheduled the parent sudo process right between these two operations. Here is the source diff:

--- sudo-orig/sudo-1.7.4p4/exec.c       Sat Sep  4 00:40:19 2010
+++ sudo-1.7.4p4/exec.c Mon Nov  1 21:48:24 2010
@@ -307,6 +307,10 @@
 
        if (recvsig[SIGCHLD])
            continue;
+       printf("debug: Missed the check for SIGCHLD, the child is still running. SIGCHLD status: %d\n", recvsig[SIGCHLD]);
+       sleep(10); // this will be interrupted by SIGCHLD, because the child exists at some time here (we run "sudo sleep 5")
+       printf("debug: We should have got SIGCHLD by now. SIGCHLD status: %d\n", recvsig[SIGCHLD]);
+       printf("debug: Entering the endless select()...\n");
        nready = select(maxfd + 1, fdsr, fdsw, NULL, NULL);
        if (nready == -1) {
            if (errno == EINTR)

After that we execute sudo, and observe the bug every time:

[root@tester2 sudo-1.7.4p4]# make >/dev/null && ./sudo -u root sleep 5
debug: Missed the check for SIGCHLD, the child is still running. SIGCHLD status: 0
debug: We should have got SIGCHLD by now. SIGCHLD status: 1
debug: Entering the endless select()…
…(this never finishes)…

The sudo author actually tried to avoid this potential race condition if SIGCHLD is received immediately
before we call select() – changeset 5334:99adc5ea7f0a. The proper way to fix this is to use a timeout in the select() call:

--- sudo-orig/sudo-1.7.4p4/exec.c       Sat Sep  4 00:40:19 2010
+++ sudo-1.7.4p4/exec.c Mon Nov  1 21:50:26 2010
@@ -307,7 +307,11 @@
 
        if (recvsig[SIGCHLD])
            continue;
-       nready = select(maxfd + 1, fdsr, fdsw, NULL, NULL);
+       struct timeval timeout;
+       timeout.tv_sec = 1; // Linux resets this, so set it everytime
+       timeout.tv_usec = 0;
+       nready = select(maxfd + 1, fdsr, fdsw, NULL, &timeout);
        if (nready == -1) {
            if (errno == EINTR)
                continue;

The select() mechanism and this bug were introduced somewhere between sudo versions 1.7.2 and 1.7.3. At least that is what I managed to see from the Changelog:

2010-06-29  Todd C. Miller  <Todd.Miller@courtesan.com>
	[72fd1f510a08] [SUDO_1_7_3] <1.7>
...
2009-11-15  Todd C. Miller  <Todd.Miller@courtesan.com>
	* script.c, sudo.c, sudo.h, sudoreplay.c, term.c, tgetpass.c:
	Use a socketpair to pass signals from parent to child.
...
2009-07-12  Todd C. Miller  <Todd.Miller@courtesan.com>
	[f5ad45f69f05] [SUDO_1_7_2]

I’ve reported this bug (#447) to the sudo maintainer, and I’m sure he will fix it when time permits. Because we all depend on sudo and love it. :)


Linux Cached/Buffers memory

September 14, 2010

I won’t try to explain in details what Linux Cached/Buffers memory is. In a nutshell, it shows how much of your memory is used for the read cache and for the write cache.

Usually when you look at your system memory usage and see that almost all of the unused memory is allocated for Cached/Buffers, you are happy, because this memory is used for file-system cache, thus your system is running faster.

Today however I observed quite an interesting fact – what I said above is still correct, however you don’t know how often these cache entries are used by the system. After all, it’s not the cache memory usage (or size) which makes the system run faster, but the cache hit ratio. If the file operations get satisfied by the cache (cache hit), then your system is running faster. If the system needs to make a physical disk I/O (cache miss), then you’d need to wait for a good few milliseconds.

What are your options, in order to know if your file cache is actually being used or is just sitting there allocated, giving you a false feeling that your system is running faster thanks to the used cache memory:

  1. (hard) In order to actually know the Cache Hit/Miss ratios for block devices, you’ll need to dig deep into the kernel, as I already explained in the “Is there a way to get Cache Hit/Miss ratios for block devices in Linux” article.
  2. (easy) You can clear the Cached/Buffers memory regularly, see how fast and how much the cache memory grows back, and draw some conclusions about the actual Cache Hit/Miss ratios.

The latter is not a perfect solution, but in all cases gives you a better idea of your file-system cache usage, than just watching the totally used memory in Cached/Buffers, and never actually knowing if it is used/accessed at all.

Therefore, you can run the following every hour in a cron job:

sync ; echo 3 > /proc/sys/vm/drop_caches
sync ; echo 0 > /proc/sys/vm/drop_caches

The commands are safe (see reference for “drop_caches“), and you won’t lose any data, just your caches will be zeroed. The disadvantage of this approach is that if the caches were very actively used indeed, Linux would need to read the data back from the disk.

A real-world example

 

Here is how the Cached/Buffers graphics of a server of mine looks for the following few days:

Linux memory usage

Pay attention to the points of interest which are marked. Here is the explanation and motivation to write this article:

  1. (Point A) The beginning of the graph shows my system after it just booted, and I did some small administrative tasks on it. After that, one script runs regularly on the machine, and as we see, it doesn’t use much file-system cache, as it doesn’t do many file operations.
  2. Then every day at 06:25 the “/etc/cron.daily” scripts are run and some of them read all files on the file-system. Such a script is the updatedb cron job. Because of the great disk activity, the Buffers/Cache usage gets maximal, as all possible files and meta data are being cached in memory.
  3. (see “Updatedb cron” markers on the graphics) After one hour, the scripts finish and no significant disk activity is done on the system any more.
  4. (Point B) But the Cached/Buffers usage never drops down. The file cache doesn’t seem to expire, and is therefore giving us the false feeling that it is being used by our system all the time, thus making it faster. But it isn’t!
  5. On Sat 15:08 the Cached/Buffers cache is cleared manually by the command I provided above, and I installed it as an hourly crontab too.
  6. As you can see, right after the cache was cleared, we see the sad reality – the Cached/Buffers cache was filled with data that nobody needed or accessed, and the high memory usage by Cached/Buffers actually didn’t speed up our system. I grieve for a while and accept the reality, and also understand why so many I/O requests are issued to my EBS storage, even though the cache was so huge.
  7. (Point C) That’s how the actual daily usage pattern of this machine looks like. The Cached/Buffers memory cache is heavily underused on my system, as it doesn’t do much I/O work. This wouldn’t be visible if I don’t clear the cache every hour.

References:


PHP non-interactive usage in a cron job

September 14, 2010

Using a PHP script in a crontab is fairly easy, as stated in the “Using PHP from the command line” documentation… Until you start to get the following warning during the execution:

No entry for terminal type “unknown”;
using dumb terminal settings.

The script works, but this nasty warning really bothers you.

Here is a sample crontab entry:

* * * * * root sudo -u www-data php -r ‘echo “test”;’

When executed, it prints the warning on STDERR.

Yes, I know I don’t need “sudo” here, but this was my initial usage pattern as I discovered the problem, and at the first time I suspected that “sudo” got crazy. Well, it wasn’t “sudo” to blame, but PHP.

Here is the fixed crontab entry:

* * * * * root sudo -u www-data TERM=dumb php -r ‘echo “test”;’

The issue was encountered on an Ubuntu 10.04 server. I though crond usually sets $TERM to something… Anyway, problem solved.


Associate Amazon EC2 Elastic IP in a different region

September 10, 2010

If you allocate an Elastic IP address in the US-East region (N.Virginia), then sorry folks but you can’t use this IP address with an EC2 instance which is located in another region, say EU-West (Ireland) for example.
No problem to remap it to an EC2 instance in another availability zone within the same region.

The documentation of Amazon EC2 leaves another impression:

Elastic IP addresses allow you to mask instance or Availability Zone failures by programmatically remapping your public IP addresses to any instance in your account.

This is a bit misleading. I already had started dreaming as how I can transfer my EC2 instance across continents with no DNS propagation issues. And also had started admiring Amazon as to how they achieved this technically… Well, they haven’t. :)


References:


USB: rejected 1 configuration due to insufficient available bus power

August 11, 2010

If your USB device is not being recognized, execute the command “dmesg” and check if the following output is there:

usb 1-1.4: rejected 1 configuration due to insufficient available bus power

The “1-1.4″ ID may be different for your configuration.

If, and only if, you are absolutely sure that your USB hub and/or hardware configuration have a safe way to actually supply enough power, you can override this barrier and force the device to be activated despite of the error message. A possible situation is where you manually applied 5V external power on your USB device and/or USB hub, like I did on my Bifferboard.

Here is how you can override the power safety mechanism:

echo 1 > /sys/bus/usb/devices/1-1.4/bConfigurationValue

Replace “1-1.4″ with your USB device ID. Be careful and have fun!


Resources:


Secure NAS on Bifferboard running Debian

August 8, 2010

This NAS solution uses OpenSSH for secure transport over a TCP connection, and NFS to mount the volume on your local computer. The hardware of the NAS server is the low-cost Bifferboard.

I’m using an external hard disk via USB which is partitioned in two parts – /dev/sda1 (1GB) and the rest in /dev/sda2. Once you have installed Debian on Bifferboard, here are the commands which further transform your Bifferboard into a secure NAS:

apt-get update
apt-get -y install nfs-kernel-server

vi /etc/default/nfs-common 
  # update: STATDOPTS='--port 2231'
vi /etc/default/nfs-kernel-server 
  # update: RPCMOUNTDOPTS='-p 2233'

mkdir -m 700 /root/.ssh
  # add your public key for "root" in /root/.ssh/authorized_keys

echo '/mnt/storage 127.0.0.1(rw,no_root_squash,no_subtree_check,insecure,async)' >> /etc/exports
mkdir /mnt/storage
chattr +i /mnt/storage # so that we don't accidentally write there without a mounted volume

cat > /etc/rc.local <<EOF
#!/bin/bash

# allow only SSH access via the network
/sbin/iptables -P FORWARD DROP
/sbin/iptables -P INPUT DROP
/sbin/iptables -A INPUT -i lo -j ACCEPT
/sbin/iptables -A INPUT -p tcp --dport 22 -j ACCEPT
/sbin/iptables -A INPUT -p tcp -m state --state ESTABLISHED -j ACCEPT # TCP initiated by server
/sbin/iptables -A INPUT -p udp -m state --state ESTABLISHED -j ACCEPT # DNS traffic

# mount the storage volume here, so that any errors with it don't interfere with the system startup
/bin/mount /dev/sda2 /mnt/storage
/etc/init.d/nfs-kernel-server restart
EOF

# allow only public key authentication
fgrep -i -v PasswordAuthentication /etc/ssh/sshd_config > /tmp/sshd_config && \
  mv -f /tmp/sshd_config /etc/ssh/sshd_config && \
  echo 'PasswordAuthentication no' >> /etc/ssh/sshd_config

reboot

There are two things you should consider with this setup:

  1. You must trust the “root” user who mounts the directory! They have full shell access to your NAS.
  2. A not-so-strong SSH encryption cipher is used, in order to improve the performance of the SSH transfer.

On the machine which is being backed up, I use the following script which mounts the NAS volume, starts the rsnapshot backup process and finally unmounts the NAS volume:

#!/bin/bash
set -u

HOST='192.168.100.102'
SSHUSER='root'
REMOTEPORT='22'
REMOTEDIR='/mnt/storage'
LOCALDIR='/mnt/storage'
SSHKEY='/home/famzah/.ssh/id_rsa-home-backups'

echo "Mounting NFS volume on $HOST:$REMOTEPORT (SSH-key='$SSHKEY')."
N=0
for port in 2049 2233 ; do
	N=$(($N + 1))
	LPORT=$((61000 + $N))
	ssh -f -i "$SSHKEY" -c arcfour128 -L 127.0.0.1:"$LPORT":127.0.0.1:"$port" -p "$REMOTEPORT" "$SSHUSER@$HOST" sleep 600d
	echo "Forwarding: $HOST: Local port: $LPORT -> Remote port: $port"
done
sudo mount -t nfs -o noatime,nfsvers=2,proto=tcp,intr,rw,bg,port=61001,mountport=61002 "127.0.0.1:$REMOTEDIR" "$LOCALDIR"

echo "Doing backup."
time sudo /usr/bin/rsnapshot weekly

echo "Unmounting NFS volume and closing SSH tunnels."
sudo umount "$LOCALDIR"
for pid in $(ps axuww|grep ssh|grep 6100|grep arcfour|grep -v grep|awk '{print $2}') ; do
	kill "$pid" # possibly dangerous...
done


Update, 29/Sep/2010 – performance tunes:

  • Added “async” in “/etc/exports”.
  • Removed the “rsize=8192,wsize=8192″ mount options – they are auto-negotiated by default.
  • Added the “noatime” mount option.
  • Put the SSH username in a variable.

Resources:


Beware of leading zeros in Bash numeric variables

August 7, 2010

Suppose you have some (user) value in a numeric variable with leading zeros. For example, you number something with zero-padded numbers consisting of 3 digits: 001, 002, 003, and so on. This label is assigned to a Bash variable, named $N.

Until the numbers are below 008, and until you use the variable only in text interpolations, you’re safe. For example, the following works just fine:

N=016
echo "Value: $N"
# result is "016"

However… :)
If you start using this variable as a numeric variable in arithmetics, then you’re in trouble. Here is an example:

N=016
echo $((N + 2))
# result is 16, not 18, as expected!
printf %d "$N"
# result is 14, not 16, as expected!

You probably already see the pattern – “016″ is not treated as a decimal number, but as an octal one. Because of the leading zero. This is explained in the man page of bash, section “ARITHMETIC EVALUATION” (aka. “Shell Arithmetic”).

In order to force decimal representation and as a side effect also remove any leading zeros for a Bash variable, you need to treat it as follows:

N=016
N=$((10#$N)) # force decimal (base 10)
echo $((N + 2))
# result is 18, ok
printf %d "$N"
# result is 16, ok

Note also that there’s another caveat – forcing the number to decimal base 10 doesn’t actually validate that it contains only [0-9] characters. Read the very last paragraph of the man page of bash, section “ARITHMETIC EVALUATION” (aka. “Shell Arithmetic”), for more details on how digits can be represented by letters and symbols. My tests however show that you can’t operate with invalid numbers in base 10, though I’m no expert here. In order to be on the safe side, I would suggest that you validate your numbers with a strict regular expression, just in case, and if you don’t trust the data input.


Resources:


Speed up RRDtool database manipulations via RRDs (Perl)

August 1, 2010

Use case
You are doing a lot of data operations on your RRD files (create, update, fetch, last), and every update is done by a separate Perl process which lives a very short time – the process is launched, it updates or reads the data, does something else, and then exits.

The problem
If you are using RRDtool and Perl as described, you surely have noticed that running many of these processes wastes a lot of CPU resources. The question is – can we do some performance optimizations, and lessen the performance hit of loading the RRDs library into Perl? We know that launching often Perl itself is quite expensive, but after all, if we chose to work with Perl, this is a price we should be ready to pay.

The RRDtool shared library is a monolithic piece of code which provides ALL functions of the RRDtool suite – data manipulation, graphics and import/export tools. The last two components bring huge dependencies in regards to other shared libraries. The library from RRDtool version 1.4.4 depends on 34 other libraries on my Linux box! This must add up to the loading time of the RRDtool library into Perl.

Resolution and benchmarks
In order to prove my theory (actually, it was more a theory of zImage, and I just followed, enhanced and tried it), I commented out the implementation of the “graphics” and “import/export tools” modules from the source code of RRDtool. Then I re-compiled the library and did some performance benchmarks. I also re-implemented the RRDs.pm module by replacing the DynaLoader module with the XSLoader one. This made no difference in performance whatsoever. The re-compiled RRD library depends on only 4 other libraries – linux-gate.so.1, libm.so.6, libc.so.6, and /lib/ld-linux.so.2. I think this is the most we can cut down. :)

So here are the benchmark results. They show the accumulated time for 1000 invocations of the Perl interpreter with three different configurations:

  • Only Perl (baseline): 5.454s.
  • With RRDs, no graphics or import/export functions: 9.744s (+4.290s) +78%.
  • With standard RRDs: 11.647s (+6.192s) +113%.

As you can see, you can make Perl + RRDs start 35% faster. The speed up for RRDs itself is 44%.


Here are the commands I used for the benchmarks:

  • Only Perl (baseline): time ( i=1000 ; while [ "$i" -gt 0 ]; do perl -Mwarnings -Mstrict -e ” ; i=$(($i-1)); done )
  • Perl + RRDs: time ( i=1000 ; while [ "$i" -gt 0 ]; do perl -Mwarnings -Mstrict -MRRDs -e ” ; i=$(($i-1)); done )

Free SSL certificates

July 9, 2010

More and more people start telling me about the StartSSL SSL authority, which is a daughter company of StartCom. The rumor that they are giving free SSL certificates looked too unbelievable to me, so I decided to review this more carefully.

After much reading at their page, what people say was confirmed – StartSSL really issue SSL certificates for free, when they are about to be used by individuals on their websites. This means that your personal name stays in the SSL certificate information which can be reviewed if you click on the SSL bar in your web browser.

Business or other legal entities verify their company’s information once for an annual fee and can then issue an unlimited count of SSL certificates too, including wild-card ones. Once verified, a business customer can purchase EV certificates for US$ 49.90 per year.

You can compare these prices with any other SSL certificate authority and you’ll see it yourself that StartSSL are the most affordable one, and the only one which doesn’t charge you for what doesn’t cost them money either – that’s why they can offer “loosely verified” SSL certificates for personal websites for free. It’s unbelievable but true.

My IT brain immediately started to doubt the technical side. I had to check if web browsers accept these SSL certificates without issuing an SSL warning about the certificate being signed by an unknown SSL authority. The test results were successful and the SSL root authority of StartSSL was recognized by the latest version of:

  • Internet Explorer 8 on Windows.
  • Chrome on Windows.
  • Firefox on Windows and Linux.
  • Chromium on Linux.

Furthermore, the Debian “lenny”, “squeeze” and Ubuntu Lucid CA repositories also recognize the StartSSL root certificate. You can verify this yourself by the following command:
openssl s_client -CApath /etc/ssl/certs -connect startssl.com:443

StartSSL have a long list with platforms and browsers which recognize their certificates. You can review the list at the products comparison page.

No more self-signed SSL certificates for personal use, hurray! :)

Update 29/Nov/2010: If you’re interested, you can also review my success story with the Support staff of StartSSL.


The Super Micro IPMI Console + Java are killing me

July 5, 2010

I don’t know if it’s Java or the Super Micro IPMI developers to blame, or both. One thing is for sure – I rarely need it, but almost each time I want to use the server-critical “Console Redirection” feature on our Super Micro servers, there is some problem with the Java applet. Thus I’m not able to access the remote console of the server quickly, which in turn gets me real headache.

Today, it’s the “Launch Console” button doing absolutely nothing on my Kubuntu desktop – no errors, no action after clicking it, no nothing. I (always) have a “backup option” – a Windows 7 virtual machine running on my desktop, as Java tends to work better for me on Windows (cross-platform, eh?). Same problem on the Windows too. As I’m a real paranoid about having a backup, I have a backup of the “backup option” – X over VNC, running on
some not-so-bleeding-edge Linux machines, in order to have a “stable” Java installation there. Though the Java failed on them today as well, as they are running Debian “lenny”, which seems to be having the latest Java version 1.6.20 too.

Well… sorry Java applets + Super Micro IPMI, you really disappoint me! :-/

27/Mar/2012: Resolution: Use the IPMIView application which does not rely on web browsers. Tested with Java Version 6 Update 31 (build 1.6.0_31) on Windows 7. Note that IPMIView does not provide a KVM console for older versions of the Super Micro IPMI devices — the good news is that those devices work well within a web browser. :)

The (ugly) fix is to downgrade your Java to 1.6.19 (and disable automatic Java updates):
http://www.webhostingtalk.com/showthread.php?t=953055

Update #1: I downgraded to Java 1.6.19 on my Windows 7 by:

  1. Uninstalling the Java 1.6.20 JRE update.
  2. Installing the Java 1.6.19 JRE update which I downloaded from the “Archive: Java[tm] Technology Products Download” page.
  3. Being able to get this working only with Chrome. Firefox and IE 8 failed to work.

Update #2: Linux doesn’t seem to be having any problems. Firefox 3.6.3 on Ubuntu and Gentoo with Sun Java 1.6.20 works fine.

Update #3: If you upgrade the IPMI firmware to version 2.02, the Windows problem is fixed.


Here is some debug info from the Debian “lenny” Iceweasel browser, the only one which issued an error:

Unable to launch ATEN Java iKVM Viewer.
An error occurred while launching/running the application.

Title: ATEN Java iKVM Viewer
Vendor: ATEN
Category: Download Error

Unable to load resource: (https://%IP%/iKVM.jar, 1.56.3.0×0)

Wrapped Exception: java.io.IOException: HTTP response 404.

At the same time, the Java test page works fine. The version on the Debian “lenny” “sun-java6-jre” package is “6-20-01lenny1″ (Java JRE 1.6.20).

The same problem is re-produced on:

  • Windows 7, running Java 1.6.20, under IE 8, Firefox 3.6.3 and Chrome 5.0.375.99.
  • Kubuntu Lucid, running OpenJDK 6 build b18, under Firefox 3.6.3.

The Firmware Revision of the IPMI interface on the X8DTL motherboard is 01.29, dated 2010-01-06. It’s not the latest one, but surely not a very old one. After all, you can’t reboot your production servers for every IPMI firmware release…

Anyway, I try not to write articles with negative attitude, but this time I just couldn’t resist.
Java, Java, Java… :)


Follow

Get every new post delivered to your Inbox.