Please login or register. August 18, 2019, 09:00:19 PM

Author Topic: issue in "smit jfs2" - "change / show characteristics of an enhanced JFS"  (Read 7558 times)

0 Members and 1 Guest are viewing this topic.

fbergenh

  • Senior Member
  • ****
  • Posts: 40
  • Karma: +0/-0
The problem I was trying to post here (  ;) ) was:

I ran into some problems with a JFS2 filesystem on AIX 5.3 TL 12.
When I tried to look at the characteristics of this filesystem via "smit jfs2" -> "Change / Show Characteristics of an Enhanced Journaled File System", I recieved the following message:

"1820-037 An internal error or system error has occurred. See the log file for further information".

A look in the smit.log gave the following message:

"1800-083 Data error: could not find matching ring field value for corresponding Command_to_Discover value. sm_cmd_opt[14].disc_field_name: "Quota". The current actual value will be used as the default value for the following dialog. Use local problem reporting procedures."

I spend some time checking the ODM and other things (and that's why I wanted a second opinion to see if somebody ran into this situation before.

But than I looked at the filesystem itself and did an fsck (sadly on a mounted filesystem, cannot unmount it right now) and  result of that was:

# fsck /home1

The current volume is: /dev/fslv00
File system is currently mounted.
fsck: 0507-020 Invalid magic number in the primary superblock
Secondary superblock is valid
Primary superblock is corrupt.
 (NOT FIXED)
fsck: 0507-278 Cannot continue.


So, I guess the filesystem corruption is the cause of the smit jfs2 problems. I know results of a fsck on a mounted filesystem is not reliable, but in this case I guess I have to perform an fsck in a maintenance window.

I even found some info on the internet that it  is possible that a fsck isn't able to fix this kind of problems and that I need to copy the secondary superblock via "dd count=1 bs=4k skip=15 seek=8 if=/dev/fslv00 of=/dev/fslv00"  to the primary superblock and rerun fsck.

fbergenh

  • Senior Member
  • ****
  • Posts: 40
  • Karma: +0/-0
# lquerypv -h /dev/fslv00 8000 100
00008000   00000000 00000001 00000000 492FB972  |............I/.r|
00008010   00000000 00000000 00000000 00006673  |..............fs|
00008020   6C763030 00000000 00000000 00000000  |lv00............|
00008030   00000000 00000000 00000000 38CB8CD0  |............8...|
00008040   00001000 0000000C 00000200 00000009  |................|
00008050   00000003 00100000 00000100 00000001  |................|
00008060   00000400 0000000B 00000400 0000096F  |...............o|
00008070   00000000 00000000 00000000 00000000  |................|
00008080   80000023 00000002 00000000 00000000  |...#............|
00008090   000E6600 0719719A 0000001F 00000032  |..f...q........2|
000080A0   00000001 00000000 00000000 00000000  |................|
000080B0   000E6600 0719719A 00000000 00000000  |..f...q.........|
000080C0   00000000 00000000 00000000 00000000  |................|
000080D0   00000000 00000000 00000000 00000000  |................|
000080E0   00000000 00000000 00000000 00000000  |................|
000080F0   00000000 00000000 00000000 00000000  |................|

# lquerypv -h /dev/fslv00 F000 100
0000F000   4A324653 00000001 00000000 492FB972  |J2FS........I/.r|
0000F010   00000000 00000000 00000000 00006673  |..............fs|
0000F020   6C763030 00000000 00000000 00000000  |lv00............|
0000F030   00000000 00000000 00000000 38CB8CD0  |............8...|
0000F040   00001000 0000000C 00000200 00000009  |................|
0000F050   00000003 00100000 00000100 00000001  |................|
0000F060   00000400 0000000B 00000400 0000096F  |...............o|
0000F070   00000000 00000000 00000000 00000000  |................|
0000F080   80000023 00000002 00000000 00000000  |...#............|
0000F090   000E6600 0719719A 0000001F 00000032  |..f...q........2|
0000F0A0   00000001 00000000 00000000 00000000  |................|
0000F0B0   000E6600 0719719A 00000000 00000000  |..f...q.........|
0000F0C0   00000000 00000000 00000000 00000000  |................|
0000F0D0   00000000 00000000 00000000 00000000  |................|
0000F0E0   00000000 00000000 00000000 00000000  |................|
0000F0F0   00000000 00000000 00000000 00000000  |................|

Michael

  • Administrator
  • Hero Member
  • *****
  • Posts: 1139
  • Karma: +0/-0
I had to read the man page to be sure I was reading everything correctly:

dd # command (duh!)
bs=4k # io unit (block) is 4k
skip=15 # skip(over) 15 i/o units before reading (basically, a seek() operation before read()
seek=8 # seek (to) block NR (since counting starts at 0, skip/seek 8 equals start at i/o block NR, or seek() before write()
if=/dev/fslv00 # input - note: do not use /dev/rfslv00 because character devices (officially) do not seek()
of=/dev/fslv00 # output, etc..

1) I would have just read both to a file to compare first - i.e., non-destructive AND even after an update, I would still have the old data to set back
2) is this a "normal", big, or scaled volume group? - Curious - also wondering - whether LVCB is in front of the primary superblock.
3) since it is not at position 0000 (primary superblock) and only the first 4 bytes (magic number "JFS2") seem to be different, very very strange how this might have occurred.
4) any messages (e.g., bad block references) in errpt?

And, of course, did it fix it?

fbergenh

  • Senior Member
  • ****
  • Posts: 40
  • Karma: +0/-0
I haven't done anything yet. The maintenance window is Friday at 18:00.

It is a normal VG (not sure how to check if LVCB is in front of the primary superblock). I have no idea at all how this happend and the errpt is totally empty.

Once again, because I ran the fsck on a mounted filesystem, I am still not sure that there is filesystem corruption.
 

Michael

  • Administrator
  • Hero Member
  • *****
  • Posts: 1139
  • Karma: +0/-0
If it is normal vol group, then the LVCB is at the front of the logical volume - definition.

And I would try and get a backup done - if you have not already :)

Keep us posted!

fbergenh

  • Senior Member
  • ****
  • Posts: 40
  • Karma: +0/-0
You mean to make sure we have a backup of the data on /home1?

Of course we do have a backup, I only hope we don't need it. Filesystem size is 450 GB, with 362 GB data.
It will take some time to restore the data from backup :-)

I keep you posted ...
« Last Edit: October 31, 2013, 02:22:41 PM by fbergenh »

fbergenh

  • Senior Member
  • ****
  • Posts: 40
  • Karma: +0/-0
Re: issue in "smit jfs2" - "change / show characteristics of an enhanced JFS"
« Reply #6 on: November 02, 2013, 07:45:54 AM »
I have absolutely no idea what has happened on this system

My colleague unmounted the filesystem last evening, did an fsck and the result was:

The current volume is: /dev/fslv00
Primary superblock is valid.
J2_LOGREDO:log redo processing for /dev/fslv00
Primary superblock is valid.
*** Phase 1 - Initial inode scan
*** Phase 2 - Process remaining directories
*** Phase 3 - Process remaining files
*** Phase 4 - Check and repair inode allocation map
*** Phase 5 - Check and repair block allocation map
File system is clean.

So, this shows that it is true. A fsck on a mounted filesystem has no meaning at all...  ;)

But, after the fsck, problems are gone. output of the two lquerypv commands are now identical and smit jfs2 - change/show characteristics shows normal output on the /home1 filesystem.

I have no clue at all what caused the problem, but it is solved.... 

Michael

  • Administrator
  • Hero Member
  • *****
  • Posts: 1139
  • Karma: +0/-0
Re: issue in "smit jfs2" - "change / show characteristics of an enhanced JFS"
« Reply #7 on: November 03, 2013, 04:16:07 PM »
My best guess is that when the unmount was performed AIX LVM flushed all data (updating the superblock as needed).

I would still check errpt for any badblock announcements and/or ask SAN administrators if they have any messages of something that could appear to be a bad-block.

You still open a PMR - as a question - to see if IBM support has any official explanation, or if they restate the position - fsck (-n) on an open filesystem may report "stale" data.

The lquerypv outout (repeated) could be interesting now though? Is the magic number now "JFS2" on both, or not.

Interesting still I say. And glad all is well, and apparently, in memory, always was!