Pular para o conteúdo

Solaris 11: Disco com problema no ZFS

Solaris 11: Disco com problema no ZFS

Após desligar um servidor com Solaris 11 “no dedo”, várias vezes, um pool espelhado do ZFS teve um dos discos corrompido.
Como era espelhado, os dados continuaram acessíveis.
O comando zpool list mostrava o status problemático do pool.

login as: ricardo
 Using keyboard-interactive authentication.
 Password:
 Last login: Wed May  9 08:09:44 2012 from 192.168.56.1
 Oracle Corporation      SunOS 5.11      11.0    November 2011
 ricardo@solaris:~$ df -h /test
 Filesystem             Size   Used  Available Capacity  Mounted on
 test                    14G   5.2G       8.4G    39%    /test
 ricardo@solaris:~$ ls -lh /test/
 total 12
 drwxr-xr-x   3 ricardo  staff          3 May  7 06:13 Docs01
 drwxr-xr-x   3 ricardo  staff          3 May  7 06:16 Docs02
 drwxr-xr-x   3 ricardo  staff          3 May  7 06:22 Docs03
 drwxr-xr-x   3 ricardo  staff          3 May  7 06:38 Docs04
 ricardo@solaris:~$ sudo zpool list test
 NAME   SIZE  ALLOC   FREE  CAP  DEDUP    HEALTH  ALTROOT
 test  9.94G  1.28G  8.66G  12%  4.18x  DEGRADED  -

A comando zpool scrub examina todos os dados de um pool, e se possível, já corrige o problema.
O scrub continua acontecendo em background, e seu progresso (depende do tamanho do pool) pode ser conferido com o comando zpool status.

ricardo@solaris:~$ sudo zpool scrub test
 ricardo@solaris:~$ sudo zpool list test
 NAME   SIZE  ALLOC   FREE  CAP  DEDUP    HEALTH  ALTROOT
 test  9.94G  1.28G  8.66G  12%  4.18x  DEGRADED  -
 ricardo@solaris:~$ sudo zpool status test
   pool: test
  state: DEGRADED
 status: One or more devices has been removed by the administrator.
         Sufficient replicas exist for the pool to continue functioning in a
         degraded state.
 action: Online the device using 'zpool online' or replace the device with
         'zpool replace'.
   scan: scrub in progress since Wed May  9 08:16:44 2012
     357M scanned out of 1.28G at 11.9M/s, 0h1m to go
     0 repaired, 27.33% done
 config:
 
         NAME        STATE     READ WRITE CKSUM
         test        DEGRADED     0     0     0
           mirror-0  DEGRADED     0     0     0
             c3t2d0  REMOVED      0     0     0
             c3t3d0  ONLINE       0     0     0
 
 errors: No known data errors
 ricardo@solaris:~$ sudo zpool status test
   pool: test
  state: DEGRADED
 status: One or more devices has been removed by the administrator.
         Sufficient replicas exist for the pool to continue functioning in a
         degraded state.
 action: Online the device using 'zpool online' or replace the device with
         'zpool replace'.
   scan: scrub in progress since Wed May  9 08:16:44 2012
     481M scanned out of 1.28G at 10.0M/s, 0h1m to go
     0 repaired, 36.77% done
 config:
 
         NAME        STATE     READ WRITE CKSUM
         test        DEGRADED     0     0     0
           mirror-0  DEGRADED     0     0     0
             c3t2d0  REMOVED      0     0     0
             c3t3d0  ONLINE       0     0     0
 
 errors: No known data errors
 ricardo@solaris:~$ sudo zpool status test
   pool: test
  state: DEGRADED
 status: One or more devices has been removed by the administrator.
         Sufficient replicas exist for the pool to continue functioning in a
         degraded state.
 action: Online the device using 'zpool online' or replace the device with
         'zpool replace'.
   scan: scrub in progress since Wed May  9 08:16:44 2012
     1.27G scanned out of 1.28G at 7.14M/s, 0h0m to go
     0 repaired, 99.33% done
 config:
 
         NAME        STATE     READ WRITE CKSUM
         test        DEGRADED     0     0     0
           mirror-0  DEGRADED     0     0     0
             c3t2d0  REMOVED      0     0     0
             c3t3d0  ONLINE       0     0     0
 
 errors: No known data errors

Após o scrub terminar, nenhum erro foi encontrado nos dados, mas o pool continuava sem espelhamento, e o próprio comando zpool status recomendava a subistiuição do disco.

ricardo@solaris:~$ sudo zpool status test
   pool: test
  state: DEGRADED
 status: One or more devices has been removed by the administrator.
         Sufficient replicas exist for the pool to continue functioning in a
         degraded state.
 action: Online the device using 'zpool online' or replace the device with
         'zpool replace'.
   scan: scrub repaired 0 in 0h3m with 0 errors on Wed May  9 08:20:11 2012
 config:
 
         NAME        STATE     READ WRITE CKSUM
         test        DEGRADED     0     0     0
           mirror-0  DEGRADED     0     0     0
             c3t2d0  REMOVED      0     0     0
             c3t3d0  ONLINE       0     0     0
 
 errors: No known data errors

Tentei colocar o disco em estado online, mas ele continuava com defeito.

ricardo@solaris:~$ sudo zpool online test c3t2d0
 warning: device 'c3t2d0' onlined, but remains in faulted state
 use 'zpool replace' to replace devices that are no longer present
 ricardo@solaris:~$ sudo zpool status test
   pool: test
  state: DEGRADED
 status: One or more devices has been removed by the administrator.
         Sufficient replicas exist for the pool to continue functioning in a
         degraded state.
 action: Online the device using 'zpool online' or replace the device with
         'zpool replace'.
   scan: scrub repaired 0 in 0h3m with 0 errors on Wed May  9 08:20:11 2012
 config:
 
         NAME        STATE     READ WRITE CKSUM
         test        DEGRADED     0     0     0
           mirror-0  DEGRADED     0     0     0
             c3t2d0  REMOVED      0     0     0
             c3t3d0  ONLINE       0     0     0
 
 errors: No known data errors

Substituí então o disco defeituoso, e refiz o mirror. O comando zpool attach faz com que o segundo disco informado como opção seja adicionado como um espelho do primeiro, já existente e com dados.

ricardo@solaris:~$ sudo zpool detach test c3t2d0
 ricardo@solaris:~$ sudo zpool status test
   pool: test
  state: ONLINE
   scan: scrub repaired 0 in 0h2m with 0 errors on Wed May  9 08:25:39 2012
 config:
 
         NAME      STATE     READ WRITE CKSUM
         test      ONLINE       0     0     0
           c3t3d0  ONLINE       0     0     0
 
 errors: No known data errors
 ricardo@solaris:~$ sudo zpool attach test c3t3d0 c3t2d0
 ricardo@solaris:~$ sudo zpool status test
 Password:
   pool: test
  state: ONLINE
   scan: resilvered 1.28G in 0h9m with 0 errors on Wed May  9 09:12:50 2012
 config:
 
         NAME        STATE     READ WRITE CKSUM
         test        ONLINE       0     0     0
           mirror-0  ONLINE       0     0     0
             c3t3d0  ONLINE       0     0     0
             c3t2d0  ONLINE       0     0     0
 
 errors: No known data errors
 ricardo@solaris:~$ ls -lh /test
 total 12
 drwxr-xr-x   3 ricardo  staff          3 May  7 06:13 Docs01
 drwxr-xr-x   3 ricardo  staff          3 May  7 06:16 Docs02
 drwxr-xr-x   3 ricardo  staff          3 May  7 06:22 Docs03
 drwxr-xr-x   3 ricardo  staff          3 May  7 06:38 Docs04
 ricardo@solaris:~$
Ricardo Portilho Proni

Ricardo Portilho Proni

Com 20 anos de experiência profissional, Oracle ACE Member – eleito pela Oracle Corporation um dos maiores especialistas do mundo em Oracle Database- Trabalhou em grande parte dos maiores bancos de dados Oracle do Brasil. Certificado em Oracle, SQL Server, DB2, MySQL, Sybase e Websphere. Conselheiro do GPO e do GUOB, palestrante do ENPO, GUOB Tech Day e Oracle Open World, escritor da Revista SQL Magazine e Instrutor na Nerv.

Deixe um comentário

O seu endereço de e-mail não será publicado. Campos obrigatórios são marcados com *

Marcações:
plugins premium WordPress