Please login or register. December 11, 2019, 10:36:21 PM

Author Topic: ls -l terribly slow  (Read 8628 times)

0 Members and 1 Guest are viewing this topic.

sjoerdhooft

  • Jr. Member
  • **
  • Posts: 8
  • Karma: +0/-0
ls -l terribly slow
« on: March 23, 2010, 02:43:23 PM »
hey everyone,

we have two almost identical servers, an IBM blade JS22 (Type 7998) and an IBM blade JS23 (Type 7778). The JS23 has to replace the JS22 because it can handle 64GB of RAM in stead of 32GB of RAM (which is the maximum of the JS22). The JS22 is installed with 5300-06-07-0818, which is out of support and because of several reasons we decided to install the JS23 with 6100-04-03-1009, which is brand new and just released. They are configured the same way and had almost the same tuning (differences are due to OS level defaults which are in 6.1 already correct).

Now comes the problem, we did a simple performance test:

date; find . -type f -exec ls -l {} \; > /dev/null ; date

which took 8 times longer on the new one. First we (I) suspected the SAN or the MPIO paths, but after moving the JS23 to the same SAN as the JS22 and setting the same path priorities that could be ruled out. Then, to make sure it wasn't the hardware we installed JS23 with the exact same level as the JS22 and the problems disappeared, so it wasn't the hardware. Then we changed the test described to rule out the different processes and it turned out that the delay is caused by 'ls -l'. Even just 'ls' reached the same performance on both servers.

Does anyone know what is going here? I couldn't find anything with IBM (there were some ls -l problems before TL1 SP2 but we're running TL4) and although we can probably live with this 'performance issue' I really want to know what's going on.

Michael

  • Administrator
  • Hero Member
  • *****
  • Posts: 1188
  • Karma: +0/-0
Re: ls -l terribly slow
« Reply #1 on: March 23, 2010, 07:08:51 PM »
It sounds like a curious puzzle. And depending on the issue at hand, could be either exciting and fun to research, or just plain boring.

My first approach would be to run a trace - as low level as possible (that is - all events) on both systems and see what I could see.

What I presume the command ls does is read the directory - to get the inode number - and the filename associated with it. A simple test of reading directories would be to just use dd if=directory of=/dev/null and compare results in terms of time, and any other statistic you happen to find. I would hope they are the same. The real work performed by ls is using the inode number to gather all kinds of facts from the inode - not the directory.

My guess is that there are some "performance" changes specifically related to file systems - maybe caused by extra system checks and/or procedures used to support RBAC, efs, and Trusted Execution - and these will cause ls to "behave" poorly compared to AIX 5.3. Note: I have no insight in AIX code - 25 years ago I debugged the bsd code used on PDP and VAX systems and although many specifics will have changed - the basic flow will remain.

Getting back to trace: if you can find a trace hook that is called often, and is "slow" - that may make an excellent PMR back to IBM for them to come up with a solution to alleviate the issue.

John R Peck

  • Administrator
  • Senior Member
  • *****
  • Posts: 134
  • Karma: +0/-0
Re: ls -l terribly slow
« Reply #2 on: March 25, 2010, 08:45:41 PM »

The difference with the "ls -l" flag is that it will then also be looking up the UID and GID in /etc/passwd and /etc/group, so I wonder
if you had password hashing set on the older system, or whether there was any significant
differences in those files.