Archive

Posts Tagged ‘solaris’

Virtual Dedupe Requires Workaround: OpenSolaris build 128a Kernel Panics on Boot in VMWare

December 4th, 2009

I am super excited you can now download a compiled version of OpenSolaris that includes the new ZFS Dedupe support!  However when I went to try installing build 128a from GENUNIX, I found it kernel panic’d on both VMWare Workstation and ESXi 4.0.

A little googling and I found this bug and workaround:

6820576 Kernel panic when booting Nevada and OpenSolaris
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6820576

 	When booting build 121 on a VMware guest instance, the system
 	may panic with the following function listed in the kernel
 	stack trace

 		pcplusmp`ioapic_read

 	Work-around: Boot with the "pcieb" driver disabled by editing
 	the GRUB "kernel$" entry.  This can be done interactively by
 	typing the character "e" when the GRUB menu appears and using
 	the arrows key to navigate to the "kernel$" entry.  Entering a
 	second "e" will allow one to append to the end of the line the
 	string " -B disable-pcieb=true".

 	To complete the boot, enter a carriage return followed by the
 	"b" character.

 	To make this change persistent, edit the file
 	/rpool/boot/grub/menu.lst and add the same string to the
 	appropriate "kernel$" entries.

That fixed the problem right up thankfully!

Uncategorized ,

Suddenly can’t login to OpenSolaris 2009.06 CIFS share

June 16th, 2009

I woke up this morning and couldn’t get onto my CIFS share. A quick look at /var/adm/messages and I saw this problem:

Jun 15 23:10:10 zed idmap[346]: [ID 702911 auth.notice] GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Clock skew too great)

Ok so this is because the clock on this machine is not close enough to the clock on my domain controller. I’ll just do a ‘crontab -e’ and plug this in:

# Sync date/time with my domain controller
15 * * * * /usr/sbin/ntpdate your.domain.controller.com

Now it should stay synchronized. But wait, I still can’t access my shares.

# svcadm disable idmap
# svcadm disable smb/server
# svcadm enable -r idmap
# svcadm enable -r smb/server

That didn’t do it.

# smbadm list
[*] [MYDOMAIN]
[*] [mydomain.com]

…and proceeds to hang.

# smbadm join -w WORKGROUP
hangs.

# smbadm join -u domainuser mydomain
hangs.

/var/adm/messages shows: svc.startd[7]: [ID 122153 daemon.warning] svc:/network/smb/server:default: Method or service exit timed out. Killing contract 70.

Also noticed despite disabling the smb/server, the process still appears to be running. Kill -9 does nothing.

I had experienced a similar issue earlier during setup and I had written it off. It’s looking like the stability of CIFS isn’t so rocksolid. This post on the cifs-discuss list definitely shows I’m not the only one having issues.

I’m tempted to use VirtualBox and run virtual Win2k3 server on top of OpenSolaris. I would create an iSCSI target in my zpool and point the Win2k3 box at that. Let windows seamlessly share files which it is good at and OpenSolaris manage the storage which it is good at. It’s an interesting thought, but I’m going to see if the latest SXCE fixes my CIFS woes first.

Uncategorized

Migrating to an OpenSolaris Fileserver

June 14th, 2009

After getting replacements for my failed drives, I tackled migrating data off my old Windows 2003 fileserver onto my fancy OpenSolaris ZFS fileserver.

My windows server decided this was the time it was going to become corrupt too.  I was using nvraid mirror and it became out of sync.  I wasn’t able to recover it.  My skepticism about cheap built in idea/sata raid has been confirmed.

All my data was still available on other drives though.  I tried attaching them to the OpenSolaris box using the read only ntfs support to copy my data to my big ZFS raidz.  The copy speed was agonizingly slow.  I had over 1tb to copy and I think it would have taken over 48 hours to copy it all.  I ended up putting the drives in a usb enclosure, attaching it to my windows laptop and copying it over the gigE nics.  Surprisingly faster.

This also gave me an opportunity to try out RichCopy as an alternative to robocopy.  As a sidebar, I use robocopy almost every day.  RichCopy includes a GUI which I would assume put behind me pouring over robocopy /? | more when I need to use an option I don’t commonly use.  Unfortunately the interface only emphisizes this was a Microsoft internal tool.  Which is to say, it’s not much better than the command line help.  The item I’m most excited that it add is multithreaded copy.  With just a couple threads I have to believe more bandwidth can be utilized.

To do all that copying, I had to setup the OpenSolaris CIFS service.  Tim Thomas’ post is a good first read.  I did run into a snag with having 1 of my domain controllers be a Windows Server 2008 machine.  Justin Thomas’ experience makes me wonder if a bleeding edge solaris version is in my future.  For now I opted to just demote the server as it was just for testing anyway.

To get the files onto the server, I just followed Tim’s instructions and had a wide open share.  Now that the files are there, I wanted to dial in the permissions.  I liked Steve Radish’s instructions.  I’m used to the old unix chmod. I found the new giant string of alphanumeric characters to implement ACL permissions with a bit daunting.  Steve made me realize you can just use the Windows side to implement the permission, and then use ls -V to see what the effective permission is.  It really helped ease me into it.

I forget at what point, but I ran into an issue where my domain credentials wouldn’t let me see the share. I was seeing this in my /var/adm/messages:

Jun 13 11:01:15 solarbox smbd[2132]: [ID 266262 daemon.error] MYDOMAIN\myusername: idmap failed

The following commands resolved it for me:

svccfg -s idmap setprop config/unresolvable_sid_mapping = boolean: true
svcadm refresh idmap

Uncategorized ,

OpenSolaris on Gigabyte GA-P965-S3

June 7th, 2009

Now that I have a new box to run ESXi, I’ve repurposing my GA-P965-S3 based system for OpenSolaris.  I had a lot of trouble getting this to work.  I was initially using OpenSolaris 2008.11.  I could get it installed.  Reboot, login screen comes up.  I plug in my credentials and as soon as the password entry box dissappeared, lockup.  Mouse stops responding, keyboard stops responding.  I tried every bios setting, disabling everything, etc.  Tried different drives, different video card.  Even tried my LSI SAS card instead of the onboard SATA.  Finally I recalled reading a post somewhere that someone was having issues with 4gb of RAM.  So I brought the system down to 2gb and BAM it worked.  Soon after all this, 2009.06 came out.  I installed that and it worked fine with 4gb of memory.  All 6 onboard SATA ports worked.

For drives, I have 2 – 750gb from my old Win2k3 based fileserver.   I also had the 4 – 1.5tb Seagate drives that came with my Opteron box.   I am allocating 2 – 50gb partitions on the 750gb drives for OS, and carving the rest out for a mirrored data partition.  The 4 – 1.5tb drives are going to be in a raidz.

The OS installer doesn’t allow you create a mirror to start with.  I followed Darkstar’s post on creating a bootable root mirror and it worked great.  You can only do this with slices, not entire disks.  The OpenSolaris installer gives you the option of creating slices or using the entire disk, so remember to use slices if you want to create a mirror.

Creating the raidz is very simple.  In one command I had 4tb of useable storage with all the awesomeness of ZFS and RAIDZ.  I ran some simple benchmarks on a single 500gb drive (non-mirrored) and my new 4tb RAIDZ using FileBench.  The results of the benchmark are below.  This confirms my RAIDZ is quite a bit faster than the single disk.

I started to offload data from my old Win2k3 fileserver onto the new RAIDZ.  I added OpenSolaris to the domain and created a CIFS (windows friendly) share.   Tim Thomas’ blog has a good post on how to do this.  I did find out of the box, it didn’t like my Win2k8 domain controller.  I decided to just remove that machine from my domain while I work out the initial setup.  I’ll probably revisit this later.  Permissions appear to be another tricky part of CIFS I’m going to come back to.

Unfortunately after a few hundred gigs of transfer, 1 of the 1.5 tb drives failed.  The RAIDZ kept on going, but soon after the first drive failure, the second drive started showing errors.   I used the Ultimate Boot CD and confirmed both drives are indeed failing.  1 of which is making click of death noises, the other appears to be on the way to failure.  I opted to go with Seagate’s Advanced Replacement and pay $20 per drive so I could get everything back up and running quickly.  There should be a discount for multiple drives.  Also, paying for this at all on a drive that is a few months old kind of stinks.

Here are the benchmark results:

Throughput breakdown (ops per second)

Workload

fileio raidz 4 – 1.5tb

fileio 1 – 500gb

multistreamread1m

208

69

multistreamreaddirect1m

204

70

multistreamwrite1m

113

65

multistreamwritedirect1m

105

67

randomread1m

70

21

randomread2k

196

167

randomread8k

202

173

randomwrite1m

108

55

randomwrite2k

163

128

randomwrite8k

160

127

singlestreamread1m

79

39

singlestreamreaddirect1m

76

39

singlestreamwrite1m

119

73

singlestreamwritedirect1m

121

73

Bandwidth breakdown (MB/s)

Workload

fileio raidz 4 – 1.5tb

fileio 1 – 500gb

multistreamread1m

208

69

multistreamreaddirect1m

204

70

multistreamwrite1m

113

65

multistreamwritedirect1m

105

67

randomread1m

70

21

randomread2k

0

0

randomread8k

1

1

randomwrite1m

108

55

randomwrite2k

0

0

randomwrite8k

1

1

singlestreamread1m

79

39

singlestreamreaddirect1m

76

39

singlestreamwrite1m

119

73

singlestreamwritedirect1m

121

73

Uncategorized , ,

VMWare ESXi: GA-P965-S3 and Supermicro AS-1021M-T2+B

May 31st, 2009

I have now built a couple ESXi machines at home and it can be tough finding hardware that you know is going to work with ESXi.  I thought I would contribute a couple working configurations.

Motherboard: Gigabyte GA-P965-S3 rev 1.0

When I built this, onboard SATA ports and NIC wouldn’t work with ESX 3.  I couldn’t even get the IDE channel to work when I bought a SAS card to use.  It would boot off the IDE cdrom, get to a certain part and die.  I ended up having to buy a sata cd-rom.  One of the reasons I bought this board is it had 4 pcie ports which would be helpful when none of the onboard items worked.

Storage Controller: LSI SAS3442E-R PCIe

I got a pretty good deal one one of these hunting ebay.  It has an internal and external port.  To use 4 internal SATA disks, you’ll need a sff-8484 to 4 sata sff-8448 cable.

Video: PCI Radeon 7000 card (Important since the LSI card takes up the 1 – 16x PCIe slot)

NIC: Intel Pro/1000 PT Desktop Adapter

When you manage to put together a supported config, ESXi is a very simple install.  If you don’t have supported hardware, it fails and tells you pretty quickly.  I ran both ESX 3.0 and ESXI 4.0 on the above hardware.

Once I got it up and running, it’s been a good system.  I recently caught the ZFS bug though and I needed a new system to allow me to continue running ESXi and another system to start using OpenSolaris.

After scouring craigslist, I found a Supermicro AS-1021M-T2+B system used.  It has a H8DME-2 motherboard.  I was a little concearned about whether or not I was going to have to jump through hoops to get this to work.  I searched a lot about the NVidia MCP55 chipset.  It seemed like I would have to do some work and maybe buy either a new NIC or storage card.  Turns out ESXi 4.0 installs without a hitch.  Both GigE nics are supported as well as the onboard SATA controller.  I even did an informal IOMeter test and I got better iops on this than on my other machine with SAS card.

Now I’m repurposing the Gigabyte machine to be my OpenSolaris machine.  As I’ve come to expect, that’s not going as smoothly as I hoped.  But that’s another post.

Uncategorized , ,