I recently had the chance to help out a friend with some rendering issues and what actually happened I thought would be an interesting case study here. This is NOT a how-to on acutally setting up and using Distributed Bucket Rendering (henceforth referred to as “DBR”). This is an exploration of one instance where it was a critical consideration in a very large render job. If you want the “how-to” of DBR. You can find it in the 3dsMax Help file here: http://docs.autodesk.com/3DSMAX/15/ENU/3ds-Max-Help/files/GUID-81270C05-C3D7-42A9-A129-459389FED064.htm
The Scenario (drastically summarized)
It was a huge rendering task for a “ride film”. If you’ve ever done anything thing like this, you know the image rendering and post processing is as big of a task as prepro and production. Rendering immersive environments frequently means rendering images to the specification of the projection system. In the case of my friend (and other ride films I’ve worked on) it usually a square image cropped from a larger one (1080x1080 cropped from 1920x1080 or some permutation thereof. Ok… that’s different. Simply and extra step right? Well, ride films are part of 360 workflows (a whole other blog post) which are almost always multi-camera rigs -- front, left, right, top, and back. You don’t usually need to render them bottom view.
Again, in the case of my friend, not only did he have to render 5 camera views, the sequence length was 2800 frames. (note I did NOT say the film length). So quick math puts us at 14000 frames—about 9 minutes if this were a linear piece… the segment is really only about 90 seconds. Because the design requirements have exorbitant detail and realism, the frames are rendering at between 30-60 minutes per frame. At this point the raging hordes jump in with better ideas and scene optimizations. But hold your horses! When you consider 5 camera sets-ups—just rendering the elements amps up the complexity almost exponentially. Sure, I concede that studios that only do ride films full time have an optimized system in place. So consider this a learning curve for those who don’t.
So while many of us are willing to tackle a 90 sec piece without much concern, these 5 camera setups require lots of planning especially when it comes to rendering resources. Rendering time can frequently equal or surpass the entire production time due to the sheer volume of rendering. Heaven forbid there are changes—which there always are.
Finally, my buddy’s machines had limited RAM. The scene file took 16gb of RAM just to load the scene. It was swapping like crazy. As you will see below, DBR uses significantly LESS ram while still giving you access to the same amount of processing power. This is a critical consideration on a deadline.
Before we can dig into tackling the rendering task, let’s get some assumptions out of the way that are specific to this case study.
<!--[if !supportLists]-->1.) <!--[endif]-->Resolutions are fixed. We can’t really cheat upscaling lower res images because of special post processing required by the projection system
<!--[if !supportLists]-->2.) <!--[endif]-->The quantity of machines is more or less fixed—at 5. Meaning unless you’re a huge shop with unlimited (relatively speaking) rendering resources, you will most likely be rendering with what you have. Assume you can upgrade RAM and drive space. We’ll talk about that later.
<!--[if !supportLists]-->3.) <!--[endif]-->Mental Ray is a requirement. Some of this may apply with VRAY, but that is not part of the scope of this discussion. Since we’re talking specifically about DBR, you can assume MR is a requirement. DBR is not available in the other renderers that ship with 3dsMax.
<!--[if !supportLists]-->4.) <!--[endif]-->Render times or more or less consistently high--- about 30-60 min/frame.
<!--[if !supportLists]-->5.) <!--[endif]-->You know backburner and have used it successfully in some way. Without this context, this post may seem somewhat academic. Having firsthand knowledge of the issues surrounding network rendering will greatly assist you in trying some of these new things out.
OK here we go. This is why you are here. Why even consider anything but network rendering?
The answer to this question has directly correlation to assumption #2—machines are fixed.
Typically we’d just throw the frames at the farm and go home for the weekend. But rendering 14000 frames requires a little more attention to planning. Saving as little as a 2 minutes/frame can equal up to 6 hours of time savings-- every minute counts.
But if render times are high (assumption #4) what options do I have? This is where DBR comes in.
Typically, DBR is a MR feature that is associated with rendering large single images. If you’re not familiar with what DBR is here’s a high-level explanation. Like net rendering it leverages machines on your network to render images. But instead of sending max files and scene data over and that machine rendering 1 whole single image…DBR leverages just the processor core to render portions of a single frame (called a bucket). You already seen buckets in MR, they are the little white corner squares that appear on an image when you render it. Each bucket represents a processor core. So if you have a dual quad-core, you will see 8 buckets. It’s pretty straight forward. With DBR, you can render a single frame on one machine faster if you use additional processors than with just that single machine’s processors. Those single machines are called “satellites”.
Render farm throughput: All frames are not equal
So how will we apply that here? When considering this method, you have to understand the math associated with the performance of your specific render farm. Every farm will be different. Everyone’s math calculations will be different. It is entirely possible that your math may lead you to conclude that DBR may not be beneficial. In other words, you need to know the capabilities of your farm per job. The only way to attain this is to experiment with these two methods.
In the case of my friend’s ride film, DBR made sense in some cases. What follows is specific to the render farm in question. These calculations will not apply to your farm. But the method of calculation will.
I use frames/hour as my final metric of rendering power. How many frames per hour can I write to disk? You can get caught up in swapping, memory latency, and all this other stuff. Those all mean something for sure, but ultimately, for me, it’s about fr/hr. And for this discussion see assumption #2 (again).
Many people, rightly so, are strict measurers of minutes or hours per frame. In the context of a single machine this is important. It’s also important to know this as part of our calculations. But it isn’t the best or final measure of the efficiency of the entire farm.
Understanding farm throughput
So for this rendering example we did some simple tests. We found “worst case frames”. Lots lights, geometry on full screen, etc… The frames we think would take the longest to render. Here's what we did.
<!--[if !supportLists]-->· <!--[endif]-->A series of at least 20 frames were rendered using the straight net render method. This should give you a good estimate about the number of frames per hour you can push through the farm.
<!--[if !supportLists]-->· <!--[endif]-->That same series of frames were rendering “locally” using DBR leveraging the processors of the other machines.
<!--[if !supportLists]-->· <!--[endif]-->Calculate the best option. Net render vs. DBR. The higher frames/hour count wins. Simple!
It’s important to recognize that as render jobs progress, the render times will change as well. This will impact the efficiency of the two methods differently. In some cases (like early on in the shot where the frame was partially black) straight network rendering was the better option. But in the middle, with all the intricate detail, lights and shadows DBR was by far the better option.
The ILLUSTRATION below is intended to demonstrate when we used old school net rendering and when we chose to use DBR. The RED LINE indicate scene complexity. The brackets across the top indicate when we use each method. The blue boxes across the bottom indicate the progression through the animation in frames. It is not EXACT as no specific measurement is provided. Its merely an aid to visualise the decision making process for each rendering method.
What about adding more resources?
RAM is a logical place to add resources to an existing farm. But in this situation there is an interesting thing about RAM usage. The scene file needed 16GB no RAM just to load (not even mentioning rendering and swapping yet), but the DBR method only uses 3-4 GB to RENDER on the host machines. So while adding more RAM may impact overall single frame rendering per machine, adding RAM with the DBR method is negligible because of its low resource overhead.
Adding machines will definitely add to your frame throughput in single frame situations. One thing about DBR is that it has a 4 satellite limit (total of 5 machines, your workstation plus 4 others). But now you can begin to make some smart choices given what you now know about RAM usage and the DBR method. It may make sense to add a few bare-bones high-end CPU’s with minimal RAM to push through the DBR frames and still leave you room to upgrade them later. Remember, you can pick and choose which satellites to use. As you add faster cores, you can drop slower ones off the satellite list.
Lastly, there is a 4 satellite limit to using DBR per licensed copy of 3dsMax. If you want to use more than 5 machines, you need to have an additional license(s) of 3dsMax.
Drive Space (for swapping)
There is another area where people tend to spend some dough; adding drive space for swapping. While this is a decent idea as drive space is SOOO cheap, I find it unlikely that, unless your render farm also houses your ripped movie collection, you are lacking for drive space. I don’t think this a really a great investment unless it’s really needed for other reasons.
Why not use DBR and Backburner all the time?
As I mentioned there is a limitation to DBR. You can only use 4 “satellite” machines (regardless of the number of cores). So even though you may have 10 machines, you can’t leverage all ten on a single machine for DBR rendering. But there is a way slightly more complicated way to do that that is best visualized in the chart below.
Basically, you pick and submit only the licensed host machines to Backburner. NOTE: Each host machine MUST be a licensed copy of 3dsMax. Each host machine that you submit then refers to its own satellites (make sure they aren’t a satellite of another machine already!)
This solution is not for everyone. This bigger picture here is help you understand where the bottlenecks may appear in your network rendering project and what you can do about it. Sometimes, merely adding more resources isn't the best solution depending on your situation. Since this was a case study from a very specific project, I concede that there maybe some other limitations or factors we didn't cover here. Again, this wasn't intended to be an all encompassing exploration of the in's and out's of DBR. I'm sure you'll let me know about them in the comments section. But please do! I always have something to learn.
If you're not doing so already, I invite you to follow me on twitter at @chrismmurray.