How to Determine No of slots Required MapReduce

What if you have a single large VM with a reservation, but the rest of the virtual machines are relatively small. The total job runtime increased from 8m30s to 15m42s, nearly a factor of two. The slot has two parts, the CPU component and the memory component. You should carry out your own virus checks before opening the e-mail or attachment. Linux load averages stay less than half the number of CPUs on the system, even when running jobs. I am on my way for becoming a cloudera Hadoop administrator. To avoid breaking any copyright laws I won't use any actual machine as an example but create me own.

Your Answer


Tuned optimally, each of the map tasks in this job runs in about 33 seconds, and the total job runtime is about 8m30s. The first step to optimizing your MapReduce performance is to make sure your cluster configuration has been tuned.

For starters, check out our earlier blog post on configuration parameters. In addition to those knobs in the Hadoop configuration, here are a few more checklist items you should go through before beginning to tune the performance of an individual job:. Unfortunately I was not able to perform benchmarks for this tip, as it would involve re-imaging the cluster. If you have had relevant experience, feel free to leave a note in the Comments section below. Almost every Hadoop job that generates an non-negligible amount of map output will benefit from intermediate data compression with LZO.

Whenever a job needs to output a significant amount of data, LZO compression can also increase performance on the output side. Since writes are replicated 3x by default, each GB of output data you save will save 3GB of disk writes. In order to enable LZO compression, check out our recent guest blog from Twitter. Be sure to set mapred. Disabling LZO compression on the wordcount example increased the job runtime only slightly on our cluster.

Since this job was not sharing the cluster, and each node has a high ratio of number of disks to number of tasks, IO is not the bottleneck here, and thus the improvement was not substantial. Tuning the number of map and reduce tasks for a job is important and easy to overlook. Here are some rules of thumb I use to set these parameters:.

To make the wordcount job run with too many tasks, I ran it with the argument -Dmapred. This yielded tasks instead of the that the framework chose by default. When running with this setting, each task took about 9 seconds, and watching the Cluster Summary view on the JobTracker showed the number of running maps fluctuating between 0 and 24 continuously throughout the job. The entire job finished in 17m52s, more than twice as slow as the original job. If your algorithm involves computing aggregates of any sort, chances are you can use a Combiner in order to perform some kind of initial aggregation before the data hits the reducer.

The MapReduce framework runs combiners intelligently in order to reduce the amount of data that has to be written to disk and transfered over the network in between the Map and Reduce stages of computation. I modified the word count example to remove the call to setCombinerClass , and otherwise left it the same. This changed the average map task run time from 33s to 48s, and increased the amount of shuffled data from 1GB to 1.

The total job runtime increased from 8m30s to 15m42s, nearly a factor of two. Note that this benchmark was run with map output compression enabled — without map output compression, the effect of the combiner would have been even more important.

When users are new to programming in MapReduce, or are switching from Hadoop Streaming to Java MapReduce, they often use the Text writable type unnecessarily. Although Text can be convenient, converting numeric data to and from UTF8 strings is inefficient and can actually make up a significant portion of CPU time.

Whenever dealing with non-textual data, consider using the binary Writables like IntWritable , FloatWritable , etc. In addition to avoiding the text parsing overhead, the binary Writable types will take up less space as intermediate data.

Since disk IO and network transfer will become a bottleneck in large jobs, reducing the sheer number of bytes taken up by the intermediate data can provide a substantial performance gain. When dealing with integers, it can also sometimes be faster to use VIntWritable or VLongWritable — these implement variable-length integer encoding which saves space when serializing small integers.

For example, the value 4 will be serialized in a single byte, whereas the value will be serialized in two. These variable length numbers can be very effective for data like counts, where you expect that the majority of records will have a small number that fits in one or two bytes. Getting started is real simple, too. Use the guide below to find out which casinos offer apps compatible with your device. You can play for free or real money. The choice is entirely up to you. Here are the most common options:.

One of the areas that online slots distinguish themselves from their brick and mortar counterparts is that you can play for free. You can play most slot games, variations and themes for free including some of the big names like Star Trek, Monopoly, Twilight Zone and others. That includes 3-reel slots, video slots, games with bonus rounds, etc. Everyone will have their own reasons. Does it work well? Do they have the games, themes and features I want?

For other people, though, the reasons can come down to not trusting online casinos, not wanting to gamble or just wanting to play a quick game or two with what little downtime they might have. Free Games Visit our free games page. We offer more than free slot games on site. Slots for Money Tired of playing for free? New Online Slots Check out the latest slot machine releases that you can play online at several reputable casinos. Android Slots Explains how Android slots work, which casinos offer an Android app and where to play.

That would be a great consolidation ratio. What if you have a single large VM with a reservation, but the rest of the virtual machines are relatively small. Admission control is going to tell us that only 6 slots are available on host B, so it will only allow 6 VMs on host A to be powered on.

Note that if you do this, some of your VMs will require multiple slots to run. For instance the large VM we used in our example might take more than 1 slot depending on what size you make it. The button below the slot size configuration may help you determine how many VMs will be affected by this change. There will be an item listed for slot size.

Do you want more information on the subject? It looks like I might have had a calculation wrong there. Thank you for the question.