Archive | Cluster

No Work Day – 3/22/09

Posted on 21 March 2009 by Shawn

I am headed down to Phoenix for the Air Show. See you all on 03/28/09.

Comments (0)

Tags: , ,

x264-MPI Rough Demo Code

Posted on 17 March 2009 by Shawn

I posted this message to the x264 list this evening (see below).

I just had the thought that there were a couple of folks in
ua-developers interested in cluster programming techniques. For all it's
faults, MPI (with C, C++ or Fortran) is still the 'gold standard' in
academia.

While my code is pretty bad, it just may provide a simple introduction
to message passing with MPI for ridiculously parallel jobs (which I made
video encoding into by arbitrarily chunking the input).

It's not much, but maybe someone wants to pick this project up on their
shoulders? Internet fame and glory await (distributed encoding is always
a popular topic on the internet, piracy being what it is these days).

The only thing this really needs is a frame server (define MPI message
that properly represents a frame, then some simple logic to send them)
and then working to make the frame server intelligent (i.e. work closely
with the deeper logic bits of x264) [Well, maybe some optimization to
make sure that the frame server overhead doesn't clobber the
parallelization gains].

I am too busy at work to make this a priority, but I'd be willing to
help out where I can. Hopefully, soon enough you guys will have a proper
cluster at HACKS on which to try this stuff out.

Peace,
Shawn
Subject: Re: [x264-devel] implementing Cluster farming From: Shawn Nock <nock@nocko.net>Date: Tue, 17 Mar 2009 23:06:42 -0700 To: Mailing list for x264 developers <x264-devel@videolan.org> I actually worked on this for a bit as a weekend hack / POC. I implemented a *very basic* [read: not looking for criticism, I know it is terrible and borderline worthless] x264-mpi workflow last year. If anyone cares I put it on my gitweb: http://git.nocko.net/?p=x264-mpi The primary commitdiff of interest is here: http://git.nocko.net/?p=x264-mpi;a=commitdiff;h=2200ac1a260a20085b2df588936911338f486f95 although there are useful patches (multipass support) later on (look for commits by 'Shawn Nock'). There is no frame server (so it relies on shared storage). I am sure that it breaks a lot of optimizations (specifically scene detection, which is right out). In a nutshell, It counts the frames and splits the frames into groups equal to the number of requested MPI processes. Then each process encodes a frame group separately, outputting to a file with a sequential extension. Concatenating these files (I don't think I implemented concatenation in the primary process... memory fades) produces coherent output, but nothing resembling the baseline output of a normal x264 run. Caveats aside, It compiles on x86_64 (mpich2) and ia64 (sgi propriatary mpi for it's numalink technology). If you don't care that it butchers encoding efficiency (and ultimately the output file), the raw fps numbers are encouraging. I'd love to see a proper MPI support and I could provide several testing platforms if someone was seriously interested in doing this. Peace, Shawn

Comments (0)

Tags: , , , , ,

03/08/2009 Work Day Post-game!

Posted on 08 March 2009 by Shawn

This is a basic summary, a more thorough post is available here.

The head node now has a POC nfsroot that boots into init (which it pulls via NFS). We are having some reliability problems after that (NFS time-out errors at different places in the init script). However, we learned a lot about how disk-less Linux works and we have a plan to fix it in the near future! The boot server can now boot the nodes, and give us BIOS, kernel, and (if we can sort out the NFS time-out) login on the serial server.

We also did quite a bit of work on the EVA5000 and come up empty handed. I tried to access the serial interface, only to learn that it was diagnostics only (no documented management interface) and Jason stalled on the Windows2000 management utilities. We’ll try again soon, I suppose that in the worst case we could use the one giant LUN as-is and not break it up… but let’s hope we can use the space a little more intelligently.

I also fixed the boot-time networking config on the server we made for the ua-developers club. I fat-fingered a conf file entry and the network didn’t come back after Jason power cycled the rack (QA testing I am sure :) ).

Stay tuned, the fun stuff is on the horizon.

Comments (1)

Hacks Cluster Work Day Mar. 8th, 2009

Posted on 07 March 2009 by Shawn

ECE 232a @ 12:30p (until we get tired).

Things that I’d like to happen (in no particular order):

  • Setup NAT on head node (so the nodes can talk to LDAP)
  • Get the EVA1000 carved up and storage presented to the head node
  • Develop a (rough) working compute node nfsroot
  • Boot several of the nodes
  • Test Myrinet connectivity, troubleshoot switch

I am not sure how much of this we’ll get done… but we’ll give it a go.

Comments (3)

3/1/2009 Work Day Post-game

Posted on 01 March 2009 by Shawn

Head-node is up and serving dhcp, tftp and nfs. It took a while to work around a very broken old PXE stack on the cluster nodes (Tyan MPX). The nodes can boot, but not into anything really useful… memtest. I need to set up some NFS root fs areas for the nodes so that we can boot CentOS on them.

In other news, I got GM2 (Myrinet) compiled for 2.6.18. The kernel module loads and Ethernet emulation seems to work. As promissed I didn’t do any of the cluster integration with Myrinet, just got the kernel module compiled. It looks like OpenMPI supports GM, there is also MPICH-GM. Lustre has a custom driver for MX systems, but as we only have lowly GM boards… we’ll have to operate lustre over IP (over GM) if we want to implement it.

Comments (0)

HACKS Work Day, March 1st 2009

Posted on 28 February 2009 by Shawn

Meeting in ECE 232a to start preliminary ground work for reviving the cluster computer formerly known as Neutron. I’ll be there around 12:30a until we get tired.

Comments (0)

Tags: ,

Fast Times @ HACKS

Posted on 09 June 2007 by Shawn

A lot of work has been going into the room lately… Two new four post racks hold 0.75T of Fast FC storage, 11 new servers (4-way and 8-way), fibre channel switches.

The Xen Virtualization environment in nearing completion, several VMs are already in service: norbert.hacks (MySQL) & prothero.hacks (LDAP Auth Server). Creating a new virtual server is as easy as cp (1) and editing the conf file.

Work has begun on the new clustered computing environment. Currently 8x 4-way 700MHz Xeon boxes. The Head node has 250GB of Dual Channel SCSI RAID5 storage. It will be sharing this with the other nodes via NFS. Not to far along on this project, yet…. but good things seem to be just around the corner.

Comments (0)

Tags:

Cluster Primer

Posted on 31 December 2006 by Shawn

Hi all,

This is a post meant to give everyone an idea about how the cluster works how to access nodes if necissary and what kind of software makes the cluster a cluster. (as soon as we get it working)

The nodes are arranged in there own Class C network space the hosts file on each machine has the IP addresses of the other machines aliased so that users can access each of the individual nodes simply by typing its name. The names are:

godzilla #for the head node
godzilla1 #node 1
godzilla2 #node 2
godzilla2 #node 3

all of the cluster nodes have the same root account and user’s home directories will be mounted via nfs.

By aws4y

Comments (0)

Tags:

Clusterf*@k

Posted on 19 December 2006 by Shawn

So the nodes of the cluster are all running, thats the good news, they all have the serv_p4 mpi servers running, that two would be good news, one problem…

I CAN’T GET ANY OF THE BLOODY EXAMPLE PROGRAMS TO RUN!!!!!!!!!111

they all give the same error:

rm_5138: p4_error: rm_start: net_conn_to_listener failed: 51539
p0_24034: p4_error: Child process exited while making connection to remote process on godzilla1: 0
p0_24034: (18.819009) net_send: could not write to fd=4, errno = 32

So as it stands right now I have to have the root directories mounted by nfs and the portage tree since were going to be using nfs trees. In the mean time, I am going to track down these errors unless someone has some experience with mpi and can diagnose my problem automatically, right now I am using password authentication for SSH since I am root.

Comments (0)

Tags: ,

Happy Thanksgiving: Meeting Announcement

Posted on 23 November 2006 by Shawn

Happy Thanksgiving all!

Don’t forget our second to last meeting of the year!

Wed. December 7th. 2006
ECE Building Rm. 208a
5:30pm

Topics of discussion:

  • LDAP installation and account management
  • Assign webmaster duties
  • Cluster update
  • HACKS DNS Overview
  • Re-applying for club status/discuss elections
  • Much, much more!

Comments (0)