« From Powerpoint to Profitability | Main | How much energy does a 100kg human pack? »

Can connect to localhost but not to its IP Address?

Welcome! If you got here, you probably searched for "Could connect to localhost, but not to its IP address" or some variation of that phrase.

That's how I got here. To writing this post, that is. After unsuccessfully trying many clever variations of my query theme in finding an answer to this problem, we decided to troubleshoot it ourselves, ended up fixing it, and decided to blog for the benefit of the wider community.

If you're lucky, (a) our solution works for you and (b) this post shows up in the first few pages of your Google results saving you much time you might have wasted in following dead-ends.

Here is the problem - We discovered it first in our hadoop cluster, which had been set up by a contracted sysadmin no longer with us. The namenocde could not talk to any of the datanodes. The logs would show "Connection refused". Debugging with "telnet -d datanode 54311" would surface a misleading error message with setsockopt complaining about lack of permission. Funnily enough, when logged on to the datanode directly, we could issue a connection request to localhost, or 127.0.0.1 and it would work perfectly, but all connection requests to datanode, its fully qualified domain name, or even its IP address would fail.

Since we weren't running many other services on the datanodes, this problem didn't manifest in other applications. However, a quick check revealed that the problem was general. We could start up sendmail or an echo server and find the same discrepancy between connecting to localhost and its IP address.

Needless to say, as most people on various forums have tried to do, we too tried the obvious thing first - Perhaps an issue with SELinux or firewalls? We turned both off. But to no avail. That's when we decided to look a bit deeper.

netstat -an | grep 54311

showed that a listener was attached to 127.0.0.1:54311, which is to the loopback interface, but not to datanode.xyz.com:54311 or its IP-address:54311. Hmmm... how could that be? We checked the hadoop configuration and it was clear that it bound to the domain name of the host, not localhost. So what gives? That's when we discovered that /etc/hosts was somehow misconfigured on our machines. The culprit, specifically, was this line:

127.0.0.1 localhost.localdomain localhost datanode.xyz.com

If you don't see the problem, it's the fact that datanode.xyz.com was being locally associated to the loopback IP address in this file. So of course, when a program issues a library call to resolve datanode.xyz.com, it's going to resolve to localhost, rather than its IP address, since the /etc/hosts file takes precedence over DNS queries.

Now the fix was clear. All we had to do was to restore the /etc/hosts file on all our datanodes. Simply replacing

127.0.0.1 localhost.localdomain localhost datanode.xyz.com

with

127.0.0.1 localhost.localdomain localhost

did the trick. If it doesn't for you, then let me know and we'll have another think about this. I hope this helps.

&


TrackBack

TrackBack URL for this entry:
http://www.pandamatak.com/cgi-bin/mt/mt-tb.cgi/72

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on June 11, 2010 7:34 AM.

The previous post in this blog was From Powerpoint to Profitability.

The next post in this blog is How much energy does a 100kg human pack?.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.35