Posted on 21 March 2009 by Shawn
I am headed down to Phoenix for the Air Show. See you all on 03/28/09.
Posted on 17 March 2009 by Shawn
I posted this message to the x264 list this evening (see below).
I just had the thought that there were a couple of folks in
ua-developers interested in cluster programming techniques. For all it's
faults, MPI (with C, C++ or Fortran) is still the 'gold standard' in
academia.
While my code is pretty bad, it just may provide a simple introduction
to message passing with MPI for ridiculously parallel jobs (which I made
video encoding into by arbitrarily chunking the input).
It's not much, but maybe someone wants to pick this project up on their
shoulders? Internet fame and glory await (distributed encoding is always
a popular topic on the internet, piracy being what it is these days).
The only thing this really needs is a frame server (define MPI message
that properly represents a frame, then some simple logic to send them)
and then working to make the frame server intelligent (i.e. work closely
with the deeper logic bits of x264) [Well, maybe some optimization to
make sure that the frame server overhead doesn't clobber the
parallelization gains].
I am too busy at work to make this a priority, but I'd be willing to
help out where I can. Hopefully, soon enough you guys will have a proper
cluster at HACKS on which to try this stuff out.
Peace,
Shawn
Subject: Re: [x264-devel] implementing Cluster farming
From: Shawn Nock
<nock@nocko.net>Date: Tue, 17 Mar 2009 23:06:42 -0700
To: Mailing list for x264 developers
<x264-devel@videolan.org>
I actually worked on this for a bit as a weekend hack / POC. I
implemented a
*very basic* [read: not looking for criticism, I know it
is terrible and borderline worthless] x264-mpi workflow last year.
If anyone cares I put it on my gitweb:
http://git.nocko.net/?p=x264-mpi
The primary commitdiff of interest is here:
http://git.nocko.net/?p=x264-mpi;a=commitdiff;h=2200ac1a260a20085b2df588936911338f486f95
although there are useful patches (multipass support) later on (look for
commits by 'Shawn Nock').
There is no frame server (so it relies on shared storage). I am sure
that it breaks a lot of optimizations (specifically scene detection,
which is right out).
In a nutshell, It counts the frames and splits the frames into groups
equal to the number of requested MPI processes. Then each process
encodes a frame group separately, outputting to a file with a sequential
extension. Concatenating these files (I don't think I implemented
concatenation in the primary process... memory fades) produces coherent
output, but nothing resembling the baseline output of a normal x264 run.
Caveats aside, It compiles on x86_64 (mpich2) and ia64 (sgi propriatary
mpi for it's numalink technology). If you don't care that it butchers
encoding efficiency (and ultimately the output file), the raw fps
numbers are encouraging.
I'd love to see a proper MPI support and I could provide several testing
platforms if someone was seriously interested in doing this.
Peace,
Shawn
Posted on 08 March 2009 by Shawn
This is a basic summary, a more thorough post is available here.
The head node now has a POC nfsroot that boots into init (which it pulls via NFS). We are having some reliability problems after that (NFS time-out errors at different places in the init script). However, we learned a lot about how disk-less Linux works and we have a plan to fix it in the near future! The boot server can now boot the nodes, and give us BIOS, kernel, and (if we can sort out the NFS time-out) login on the serial server.
We also did quite a bit of work on the EVA5000 and come up empty handed. I tried to access the serial interface, only to learn that it was diagnostics only (no documented management interface) and Jason stalled on the Windows2000 management utilities. We’ll try again soon, I suppose that in the worst case we could use the one giant LUN as-is and not break it up… but let’s hope we can use the space a little more intelligently.
I also fixed the boot-time networking config on the server we made for the ua-developers club. I fat-fingered a conf file entry and the network didn’t come back after Jason power cycled the rack (QA testing I am sure
).
Stay tuned, the fun stuff is on the horizon.
Posted on 07 March 2009 by Shawn
ECE 232a @ 12:30p (until we get tired).
Things that I’d like to happen (in no particular order):
- Setup NAT on head node (so the nodes can talk to LDAP)
- Get the EVA1000 carved up and storage presented to the head node
- Develop a (rough) working compute node nfsroot
- Boot several of the nodes
- Test Myrinet connectivity, troubleshoot switch
I am not sure how much of this we’ll get done… but we’ll give it a go.
Posted on 01 March 2009 by Shawn
Head-node is up and serving dhcp, tftp and nfs. It took a while to work around a very broken old PXE stack on the cluster nodes (Tyan MPX). The nodes can boot, but not into anything really useful… memtest. I need to set up some NFS root fs areas for the nodes so that we can boot CentOS on them.
In other news, I got GM2 (Myrinet) compiled for 2.6.18. The kernel module loads and Ethernet emulation seems to work. As promissed I didn’t do any of the cluster integration with Myrinet, just got the kernel module compiled. It looks like OpenMPI supports GM, there is also MPICH-GM. Lustre has a custom driver for MX systems, but as we only have lowly GM boards… we’ll have to operate lustre over IP (over GM) if we want to implement it.
Posted on 28 February 2009 by Shawn
Meeting in ECE 232a to start preliminary ground work for reviving the cluster computer formerly known as Neutron. I’ll be there around 12:30a until we get tired.
Posted on 09 June 2007 by Shawn
A lot of work has been going into the room lately… Two new four post racks hold 0.75T of Fast FC storage, 11 new servers (4-way and 8-way), fibre channel switches.
The Xen Virtualization environment in nearing completion, several VMs are already in service: norbert.hacks (MySQL) & prothero.hacks (LDAP Auth Server). Creating a new virtual server is as easy as cp (1) and editing the conf file.
Work has begun on the new clustered computing environment. Currently 8x 4-way 700MHz Xeon boxes. The Head node has 250GB of Dual Channel SCSI RAID5 storage. It will be sharing this with the other nodes via NFS. Not to far along on this project, yet…. but good things seem to be just around the corner.
Posted on 31 December 2006 by Shawn
Hi all,
This is a post meant to give everyone an idea about how the cluster works how to access nodes if necissary and what kind of software makes the cluster a cluster. (as soon as we get it working)
The nodes are arranged in there own Class C network space the hosts file on each machine has the IP addresses of the other machines aliased so that users can access each of the individual nodes simply by typing its name. The names are:
godzilla #for the head node
godzilla1 #node 1
godzilla2 #node 2
godzilla2 #node 3
all of the cluster nodes have the same root account and user’s home directories will be mounted via nfs.
By aws4y
Posted on 19 December 2006 by Shawn
So the nodes of the cluster are all running, thats the good news, they all have the serv_p4 mpi servers running, that two would be good news, one problem…
I CAN’T GET ANY OF THE BLOODY EXAMPLE PROGRAMS TO RUN!!!!!!!!!111
they all give the same error:
rm_5138: p4_error: rm_start: net_conn_to_listener failed: 51539
p0_24034: p4_error: Child process exited while making connection to remote process on godzilla1: 0
p0_24034: (18.819009) net_send: could not write to fd=4, errno = 32
So as it stands right now I have to have the root directories mounted by nfs and the portage tree since were going to be using nfs trees. In the mean time, I am going to track down these errors unless someone has some experience with mpi and can diagnose my problem automatically, right now I am using password authentication for SSH since I am root.
By aws4y
Posted on 23 November 2006 by Shawn
Happy Thanksgiving all!
Don’t forget our second to last meeting of the year!
Wed. December 7th. 2006
ECE Building Rm. 208a
5:30pm
Topics of discussion:
- LDAP installation and account management
- Assign webmaster duties
- Cluster update
- HACKS DNS Overview
- Re-applying for club status/discuss elections
- Much, much more!