Searching...
Wednesday, February 1, 2017

Antarris-1333MHz-DDR3-RAM-Memory-SODIMM


doug: first i'm going to compare ddr4 rdimmwith ddr4 lrdimm. on the rdimm, all commands, address and control are buffered by the register.here shown as the device in the middle of the rdimm with the acronym rcd which standsfor register clock driver. that's the acronym used by the industry. however as you can see,the data outputs from the dram are not buffered on the rdimm. therefore, that can be fromone to four dram loads presented at the rdimm connector. this picture shows the front sideof the rdimm, there are just as many drams on the backside. so you can imagine four dramsvertically placed where i'm showing one to four dram loads. these additional loads degradesignal integrity. off to the right, on the lrdimm, command address and control bufferingare similarly buffered by the register like

on the rdimm. however the dram outputs arealso buffered by the data buffers. here shown as the chips with db. this means that therewill only be one load presented at the lrdimm connector instead of four, like there wereon the rdimm. this leads to better signal integrity on the data signals at the edgeconnector also referred to as dq bits on the edge connector. however, having fewer loadsat the connector is why lrdimms can be populated into your server with less degradation inperformance. imagine three rdimms populating a server. that would mean up to three timesfour loads, or twelve drams loads, connected onto the dq path of the motherboard. on thelrdimm, that would mean only three times one or three loads on the dq path. fewer loads,lrdimm.

this concept of rdimm vs. lrdimm is the samefor ddr4 as it was in ddr3. however, we will see later that the ddr4 lrdimm has a betterarchitecture improving signal integrity. so now i'm going to start showing the advantagesof ddr4 versus ddr3 and here i'm going to be talking about both rdimms and lrdimms.as far as scalability goes, ddr4 rdimms and ddr3 lrdimms use the same approach of havinga central register device to buffer command and address for the memory module. however,in my opinion, ddr4 load reduced lrdimms are more scalable than ddr4 rdimms.and i'm showing it here by bringing in the red checks on the top, okay? as you can seein the pictures, the same central rcd is used in both ddr4 lrdimms and ddr4 rdimms. thereforeall of the software used to control the drams

through the rcd on a ddr4 rdimm can be reusefor controlling drams on lrdimms. hence the scalability. in addition to the register,lrdimms also have nine data buffers, located between the lower drams and the edge connector.the data buffers are controlled through the register and intercept the memory read, writedata. in a ddr3 lrdimm a completely different buffering device called the memory buffer,is communicating with the host controller. therefore, completely new software must bedeveloped to address the drams on the ddr3 lrdimm, because the register software cannotbe reused. item two; ddr4 modules will eventually reach speeds of 3200 mega transfers per second,whereas ddr3 modules have topped out at 2133. currently ddr4 is defined for operating speedsbetween 1600 and 2400 mega transfers per second,

but there are plans to increase the speedin future products. additionally an incredible amount of drim memory is possible on theseddr4 modules. ddr4 addressing schemes are prepared to handle one terabyte of memory.in ddr3, while 64 gigabytes modules might be realizable, i don't expect any higher densitiesfrom ddr3 module vendors. i wanted to get back to comparing ddr3 andddr4 again. in this case, i want to point out the ddr4 improvements in the signal flowfor the dq bits. these pictures show the dq data paths. you can see that the ddr4 dq datapath is very consistent from column to column of drams. shown here as vertical path connections.in ddr3, the trace variability from one dq bit to the next is between two inches to sixinches in length. this means that data arriving

simultaneously for two different drams atthe edge connector will have two very different flight times, at which the data will arriveat the drams. the modules calibrate out this variability in trace lengths but signal integrityand performance are compromised. another benefit to ddr4 is that the training algorithms toremove trace length variability in the command address and dq paths are completely controlledby the host controller. and finally, because these training algorithms are done completelyby the host controller and the backside bus is not isolated, the host can be responsiblefor creating all of the training software and therefore the training software can bestandardized even if different chipsets are used from companies such as idt.

i wanted to quickly show how the ddr4 commandsflow from the host to the register data buffer and dram. for the rcd, the register commandwords are called rcw, or register command word. you can see here that rcw writes aredone directly from the host to the rcd. for reads, an rcw command is sent by the hostto the rcd, to move the bits into a special multipurpose register in the dram, and thenthe data that has been written into these special registers, called mpr registers, isread out of the dram onto the dq bus. the data buffer works in a similar way, for thedata buffer, the buffer control words are called bcw. you can see here that writes arewritten from the host via the rcd to the data buffer. for reads, a command is sent by thehost to the data buffer to move the appropriate

data buffer bits into a special multipurposeregister located in the data buffer. then the register's data is readout of the databuffer onto the dq bus. and finally, dram mode registers called mrs are written to,from the host, via the rcd to the dram. reading from the mrs registers is also done throughthe same multipurpose registers inside the dram that are used to readout the rcd registercontents. in addition, i wanted to point out that you can individually write to the individualdram registers and data buffer registers which is something that was not available in ddr3,which helps with training. and finally, there's even parody options available to make surethat the signals flowing into the rcd from the host, as well as into the drams and thedata buffers, is valid data. and the difference

here is that in ddr3, only the signals flowingfrom the host into the rcd had parody checking. there was no parody checking for the drams. this slide is meant to intentionally showyou how complicated it can become when writing to rcws and bcws and mrs’ etc. and i'm goingto elect mike and lecroy, go into the specifics of some of these protocols with their presentation,but i wanted to show them to you here. you can see the table below for details, but thereare seven mrs registers, of which mrs zero through mrs six, are used to write controlinformation into the drams. writing to mrs7, rank zero, side a, with an a twelve address,bit twelve equals zero writes to the rcws. writing to mrs7, rank zero, side a with, atwelve equal one writes to the data buffer

controllers. you can write to them via thehost controller. you can also write to them through a serial bus called i2c. there aredifferent ways of writing, so for example, you may be booting the server and that woulduse mrs commands to communicate with all the chipsets on them. and simultaneously, youcould communicate with maybe some kind of debugging software and hardware through thei2c interface at the same time. so again, i'm going to let lecroy go into the specificsof these protocols with their presentation, so i'll stop here and past the baton overto mike. mike: okay, thanks for that, doug. well ithink it's pretty clear that these are not your garden variety dimms, the ddr4 lrdimmsare going to be pretty sophisticated devices,

definitely a quantum leap in terms of capacityand performance. definitely, there's going to be more complexity and more testing whenyou are working with lrdimms. so for the second part of our webinar, i'm going to start byshowing you how the teledyne lecroy bus analyzer is used to test ddr4. i'll be using some screencaptures from the analyzer throughout the presentation to show you real ddr4 traffic,and this is one way to get visibility to the ddr4 bus. and then we'll focus the discussionaround the configuration process. we'll look at some of the different register and buffercommands that get set on power on. along the way, we'll see how the bus analyzer can helpyou troubleshoot problems that might come up.

the analyzer uses a slot interposer-styleprobe, which sits in line in the dimm slot. you remove the memory, insert the probe andthen put the dimms on top. these are actually udimms in this picture. but some good news,ddr4 rdimm and lrdimm, they're going to continue to use the same connector as udimm. so wehave a single interposer that can be used to basically test both unbuffered and buffereddimms. the blue cable is attached directly to thekibra 480 analyzer. the interposer probe is self-powered, so it pulls power from the analyzerso that it can capture the boot up sequence. when your memory is getting configured, it'sgoing to record the command address and control signals, allowing you to analyze the timing.you can see any state changes and basically,

you see every command that gets sent to thelrdimms. there's no calibration step, so this makesit quick and easy to get started. the only thing you need to do is go in through thesoftware and tell the analyzer what speed and latency you're running. you specify theseparameters on the left and then the software loads the exact timing intervals for thatdimm on the right. it can track up to 65 different protocol and timing violations. and it willactually trigger if it detects any errors in real-time. okay, there's a full blown timing waveformviewer that allows you to see what violations occurred. this is an example of a timing violation.here, the controller is issuing two write

commands that are too close together. youcan see all the control signals, the rank, the bank, the full command is decoded. thesoftware uses timestamps to actually show you which commands caused the violations andyou can, of course, do your own measurements. but verifying the compliance at the jedectiming table level is really just one part of the feature set here. we have users rightnow doing ddr4 bring-up testing, functional testing, debugging, you know, real problems,at power on. and basically, you can do all this at much lower cost than using a logicanalyzer by using the lecroy ddr bus analyzer. okay, let's start by walking through the configurationsequence for an lrdimm, which starts at power on, you're going to apply voltage then stayin the reset state for 200 microseconds. just

like ddr3, the host will read the spd to getthe device's operating parameters. the command and address lines between the host and theregister are trained first. this is usually initiated with a register command. this justhelps make sure that the input side of the register is clocking in a clean signal. thisis how all your commands are sent to the dimm, so it's pretty critical. next step is to setup the register itself. this is where you see more register commands, or rcws. we'lllook at these in detail in a second. the first mrs command that are going to be sent to thedram are going to be the typical mrs commands that are needed to setup your cl, your burstlength, you still need to do this at the dram level, right? you need to send the mrss, andthis is done per rank. than your normal zq

calibration commands are sent. these two stageslook a lot like what occurs today with udimm. but then you need to train the dq, dq lineson the data buffers. this is, again, initiated by the host with the register command. really,it's a mpr override read that allows the host to pick one of several training patterns thatthen gets pipelined from the buffer back to the host over the data lines. so, this isreally for the host to optimize its receivers and buffer chips like the idt 0124 have theability to be tuned independently from the dram. there's a large set of special buffercontrol words that doug mentioned that the host is going to use to customize how thedata buffer operates. mostly to get better signal integrity, then it's going to moveto the full dq, dqs training on the dram side

of the buffer. this is your right-leveling,your retraining, then it exits to normal operation. so, definitely the initialization sequenceis more elaborate for sure, then what we do with udimm. it's all done serially for eachdimm, so this could take several minutes with a fully populated system. and again, it'sall directed by the memory controller so, this needs a lot of testing. okay, we're looking here at the differentdimm commands, different commands on the host controller side, that he's going to use toset up and operate an lrdimm. it needs a large command set primarily to configure the registerand the buffer. the first two are standard ddr commands, right? common to udimm and rdimm.for basic operation and memory, you got your

normal read writes. these are, again, targetedat the dram. you still need to write the mode registers, write with the mrs commands, themrs zero through six. again, we now have this mpr or multipurpose register read and writesthat can be sent to the dram or to the buffer. memory guys kind of refer to this as a scratchpad inside the dram where you can read and write custom patterns for doing things. likeretraining and error recovery. the last two sequences are special commands, you got, again,these are specific to rdimm and lrdimm. but you got your register control words that configurethe register, and you got your buffer control words that configure the data buffers. we'lllook at these in a second. okay, if you're going to be involved withtesting and qualifying lrdimms, it's really

helpful to understand the structure and flowof these commands. this is some of the stuff that doug talked about but i think it bearsrepeating. again, standard commands, your write, you refresh your mrs, they're goingto flow through the register to the dram. but during initialization, when the host isprogramming the lrdimms, it will send this mode register seven. this mr seven is notconsidered a normal mode register. with lrdimm it becomes a register or data buffer command.it's always sent to the chips like zero and if the address bit twelve is zero then itis routed to the rcd. if the address bit twelve is one, it's routed to the data buffer overthe bcom bus. i'll talk about that in a second but the key point here is that the analyzeris here, it's monitoring the command and the

control lines, so we can see your standardcommands, we can see all your buffer commands with the analyzer. let's stay with the register commands fora second. again, they flow from the controller to the rcd. these are usually the first commandson the bus at power on. they're used to configure the register. we can set things like the inputbus determination, parody checking. they flow through the rank zero drams, behind the register,but because it looks like an mr seven, it's ignored by the dram, okay? mr sevens supposedto be ignored, but just by the dram. the remaining address bits, you know, your zero throughtwelve, they identify the control word and basically the payload of the rcw.

what does it look on the bus? this is an rcw,it requires three clock cycles to transmit a typical rcw command. your chip select zerois low, okay? your bank group one is low. so this is going to look just like an mrs.your command pins are going to look like an mrs, where you got ras-cas write enable alllow, activate is high, your address pin is zero. really zero through twelve containsthe address and the payload of the register commands. so, just like a normal mrs, there’spayload that's traveling on the address bus. okay? alright so there are 27 register control words,really more than i can list. this table just gives you a few examples, i'm actually onlyshowing the register address bits. so these

bits here are actually the part that containsthe payload, right? the rest of the address just identifies which command. so of the 27commands, about half use a four-bit payload and the other half use an eight-bit word size.these settings are what gets written into the register. one of the main roles of registercontrol words is to tune the output side of the register chip. so these outputs routesignals, you know, your command, your address, directly to the dram. alright, there are hooksto set the speed to enable parody checking, turning off chip selects that aren't gettingused. there's lots of controls, okay? so, for example, the rco5 gives us a four-bitconfiguration for setting the clock driver strength. this allows the host to boost theclock signal to a higher level, maybe to compensate

for a too-high stack or a different designor a different rock hard. so, register commands will play an important part in getting thesystem kit configured and really running at the higher ddr4 speeds. alright, another quick example, the rco9,there are several power-saving controls within this register control word. it's a four-bitrcw, these are the settings. they're configured by the host, alright? so, things like theinput bus termination can be enabled or disabled. this is similar to odt, only for the register.so the ibt resistors are actually integrated on the rcd. and they help reduce stub reflections.there's something called cke power down mode, which lets the register put the dram channelin power down when all the chip selects are

idle. okay, so a lot of controls for the rcw,similar controls for the buffer chips. but getting visibility to these register commandsis one of the first steps in bringing up an arden system. so, say you're trying to debuga problem, getting a dimm into a lower power state. okay, using the bus analyzer, it'spossible to trigger on any standard command, okay? so in this window, we're actually seeingthe rco9 as its defined in the spec. so, to the verify that you are enabling the rightbits, there is a built-in option to trigger on the rc, including the entire 12 bit address.for the rco9, all we need to do is set the address bit four and seven to one. so thosebits go high, it's a simple bit mask, now you're set to trigger on the rco9 command.

okay, so the analyzer triggers, first occurrence.the trigger marker is right here. it's sort of this red line. again, it's usually a clockor two past the actual event. so floating the mouse over the command itself will basicallygive you the decoded bits. this is exactly what's getting set, all right, with the powerdown and ibt on. that's getting enabled, again the idea behind this specific command is thatthis type of termination consume a little more power but does allow faster power downexits. so it's something that a lot of vendors are going to use. but basically, there's anynumber of rc commands that you might be interested in capturing. so the analyzer gives you aneasy way to do that. alright so the data buffers have their ownset of commands just like the register commands,

only they're used to customize the operationof the data buffer. they flow from the memory controller, through the register to the databuffer itself. like the register commands, these look like mr7s, where the address linesare really the payload. and as doug mentioned, the physical link between the rcd and thedata buffer is called the bcom bus. it's a nine-pin control bus that connects the registerto each data buffer. and this is how, the buffer commands get to the data buffer. it'sonly nine pins though, so the register can't really pass it through as an mrs7. the registeractually has to convert or mux the mrs7 into this bcom format. so, for buffer command, the rcd maps the 12address bits into five separate data transfers

that are basically four bits wide. they'resent over this bcom bus, back to back, it looks like this on the bus where again eachbuffer chip gets a bcom command similar to this. it takes five clock cycles plus a parodycycle to basically complete this command. once it gets there, the data buffer decodesthe command, writes it to its function space and to complete the command. so it's a littleunusual. like most things from jedec, it's a bit convoluted, but it works. the main functionfor this bcom interface is to transmit your buffer control words. but the spec also requiresthat the register send all read/write commands and mrs information over the bcom bus, sortof to allow the data buffers to keep tabs on what the dram is doing.

so again just to review that one more timecause it is confusing. the buffer command data flow, the host is going to want to changea parameter, typically at boot up. it sends the mrs7 to the register, the register seesthat it is a mrs7 with the address bit set to one. so it's a buffer command, it's notmeant for me, it converts it into a bcw and sends it out the bcom bus to the data buffer.the db writes it to the function space to complete the command. okay, so normal commands,your reads, your writes, your mrs, they are going to flow directly between the host andthe dram. so, that's one point that shouldn't be lost on anyone, this is again how theselrdimms are going to decrease your loading and give you the higher capacity that they’repromising. so again, pretty sophisticated,

no doubt about it, these are highly configurable.again, they allow you to customize every aspect of the termination, the signal strength onthe dq lines in both directions. there are commands to optimize power consumption onthe data buffer. there are read/write delays that can be added per data line. so, no doubta lot of controls, the rcd carries some of this load, but it's mostly on the host controllerto get this working. so having an analyzer to get visibility to the host commands ispretty key. both to just get the system configured as well as reaching the higher speeds thatddr4 is promising. okay, so that is the end of the material thati formally wanted to present. and so, by all means if there are any questions, please typethem into the chat box. we'll certainly do

our best to answer any questions. even ifthey come in later, we'll respond to you by email. okay, well here's one question, someoneis asking if it's possible to configure these data buffer chips in parallel or do they haveto be configured individually? and that is a good question, and i believe the answeris that, because of the sensitivity to each system, that you're going to have to configureeach buffer chip individually, in a serial fashion, every time you boot up. i don't thinkthere is any way that you're going to be able to come up with a series of settings thatwill work in all environments or in even a given environment from one boot to the next.so this configuration process is one you won't be able to get around, as far the currentspec goes. okay, well it looks like we haven't

got any other questions at this instant. iwill wrap up the webinar now. i certainly appreciate you joining us today. apologiesfor the audio glitch at the beginning. and look forward to joining us next time for ournext topic, which will be coming out next quarter. we'll certainly be notifying youabout that. thanks again.

0 comments:

Post a Comment