Solaris 11: Disco com problema no ZFS
Após desligar um servidor com Solaris 11 “no dedo”, várias vezes, um pool espelhado do ZFS teve um dos discos corrompido.
Como era espelhado, os dados continuaram acessíveis.
O comando zpool list mostrava o status problemático do pool.
login as: ricardo
Using keyboard-interactive authentication.
Password:
Last login: Wed May 9 08:09:44 2012 from 192.168.56.1
Oracle Corporation SunOS 5.11 11.0 November 2011
ricardo@solaris:~$ df -h /test
Filesystem Size Used Available Capacity Mounted on
test 14G 5.2G 8.4G 39% /test
ricardo@solaris:~$ ls -lh /test/
total 12
drwxr-xr-x 3 ricardo staff 3 May 7 06:13 Docs01
drwxr-xr-x 3 ricardo staff 3 May 7 06:16 Docs02
drwxr-xr-x 3 ricardo staff 3 May 7 06:22 Docs03
drwxr-xr-x 3 ricardo staff 3 May 7 06:38 Docs04
ricardo@solaris:~$ sudo zpool list test
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
test 9.94G 1.28G 8.66G 12% 4.18x DEGRADED -
A comando zpool scrub examina todos os dados de um pool, e se possível, já corrige o problema.
O scrub continua acontecendo em background, e seu progresso (depende do tamanho do pool) pode ser conferido com o comando zpool status.
ricardo@solaris:~$ sudo zpool scrub test
ricardo@solaris:~$ sudo zpool list test
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
test 9.94G 1.28G 8.66G 12% 4.18x DEGRADED -
ricardo@solaris:~$ sudo zpool status test
pool: test
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub in progress since Wed May 9 08:16:44 2012
357M scanned out of 1.28G at 11.9M/s, 0h1m to go
0 repaired, 27.33% done
config:
NAME STATE READ WRITE CKSUM
test DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
c3t2d0 REMOVED 0 0 0
c3t3d0 ONLINE 0 0 0
errors: No known data errors
ricardo@solaris:~$ sudo zpool status test
pool: test
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub in progress since Wed May 9 08:16:44 2012
481M scanned out of 1.28G at 10.0M/s, 0h1m to go
0 repaired, 36.77% done
config:
NAME STATE READ WRITE CKSUM
test DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
c3t2d0 REMOVED 0 0 0
c3t3d0 ONLINE 0 0 0
errors: No known data errors
ricardo@solaris:~$ sudo zpool status test
pool: test
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub in progress since Wed May 9 08:16:44 2012
1.27G scanned out of 1.28G at 7.14M/s, 0h0m to go
0 repaired, 99.33% done
config:
NAME STATE READ WRITE CKSUM
test DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
c3t2d0 REMOVED 0 0 0
c3t3d0 ONLINE 0 0 0
errors: No known data errors
Após o scrub terminar, nenhum erro foi encontrado nos dados, mas o pool continuava sem espelhamento, e o próprio comando zpool status recomendava a subistiuição do disco.
ricardo@solaris:~$ sudo zpool status test
pool: test
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub repaired 0 in 0h3m with 0 errors on Wed May 9 08:20:11 2012
config:
NAME STATE READ WRITE CKSUM
test DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
c3t2d0 REMOVED 0 0 0
c3t3d0 ONLINE 0 0 0
errors: No known data errors
Tentei colocar o disco em estado online, mas ele continuava com defeito.
ricardo@solaris:~$ sudo zpool online test c3t2d0
warning: device 'c3t2d0' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present
ricardo@solaris:~$ sudo zpool status test
pool: test
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub repaired 0 in 0h3m with 0 errors on Wed May 9 08:20:11 2012
config:
NAME STATE READ WRITE CKSUM
test DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
c3t2d0 REMOVED 0 0 0
c3t3d0 ONLINE 0 0 0
errors: No known data errors
Substituí então o disco defeituoso, e refiz o mirror. O comando zpool attach faz com que o segundo disco informado como opção seja adicionado como um espelho do primeiro, já existente e com dados.
ricardo@solaris:~$ sudo zpool detach test c3t2d0
ricardo@solaris:~$ sudo zpool status test
pool: test
state: ONLINE
scan: scrub repaired 0 in 0h2m with 0 errors on Wed May 9 08:25:39 2012
config:
NAME STATE READ WRITE CKSUM
test ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
errors: No known data errors
ricardo@solaris:~$ sudo zpool attach test c3t3d0 c3t2d0
ricardo@solaris:~$ sudo zpool status test
Password:
pool: test
state: ONLINE
scan: resilvered 1.28G in 0h9m with 0 errors on Wed May 9 09:12:50 2012
config:
NAME STATE READ WRITE CKSUM
test ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
c3t2d0 ONLINE 0 0 0
errors: No known data errors
ricardo@solaris:~$ ls -lh /test
total 12
drwxr-xr-x 3 ricardo staff 3 May 7 06:13 Docs01
drwxr-xr-x 3 ricardo staff 3 May 7 06:16 Docs02
drwxr-xr-x 3 ricardo staff 3 May 7 06:22 Docs03
drwxr-xr-x 3 ricardo staff 3 May 7 06:38 Docs04
ricardo@solaris:~$