NERC Lessons Learned - Episode 9

April 19, 2021, midnight by Chris Sakr
Last modified April 23, 2021, 1:57 p.m.


Below is the full transcript for this episode. If you'd like to review or follow along with the original .pdf version of this NERC Lesson Learned, visit:

Chris Sakr: One of my all time favorite comedic moments happens in Indiana Jones and the Last Crusade, I'll set the scene. Indy's been captured by the Nazis who were trying to find the Holy Grail and wreak worldwide havoc, as they do. They've taken Indy's dad's diary, which should contain a map to the Grail.

Nazi: "Where are these missing pages? This map, we must have these pages back." 

Elsa: "You're wasting your breath. He won't tell us. And he doesn't have to."

Chris Sakr: Indy's given the pages to his old friend, Marcus Brody, who has one job, get the map to Turkey for a headstart at the Grail. And when the Nazis say, Marcus will stick out like a sore thumb, Indy puts them straight with the quickness.

Indiana Jones: "Brody's got friends in every town and village from here to the Sudan. He speaks a dozen languages, knows every local custom meal blend in disappear. You'll never see him again with any luck he's got the grail already."

Chris Sakr: Indy looks like a guy with a trick up his sleeve, after all Marcus is educated, smart and driven. He's an expert. Then suddenly we're on the streets a scan drawn with Marcus Brody.

Marcus Brody: "Does anybody here speak English? Or even Ancient Greek? I want to know, thank you sir… …Does anyone understand a word I’m saying?”

Chris Sakr: So Indy did have a trick up his sleeve, just not the one we thought expert bookworm though, he may be. Marcus is out of his depth. Indy later tells his dad, Marcus got lost in his own museum. Indy on the other hand is a bookworm and experienced in the field. He's gotten down and dirty faced off against Nazis and can spot that kind of trouble around the corner. He sees the big picture their on either side of a bad situation. So thinking fast, adding up as available information and expertise indie makes a calculated split-second decision to protect the grail and his friend. Protection is one of the most important aspects of system reliability. This doesn't mean shielding equipment from every fallen tree or lightning strike, that would be ideal. It's also not possible. And instead of shutting down the entire system because of a single fault, we mitigate with smart, fast thinking relays.

Jessica Zamonis: You need that relay to know, is this my problem that I need to tell my breaker to open for? Or is this maybe another relays' problem that it should tell its breaker to open for? And we take it even one step further when we say, well, if that's somebody else's problem, we may want to hang out a little bit and make sure that they take care of their problem before we kind of ignore the issue, because if they can't take care of their problem, then the first relay may be required to tell its breaker to open in case that other breaker doesn't open for whatever reason, it happens. So you end up kind of with these layers of protection, almost concentric circles. If you will going out saying we want to trip just what we need to, to protect the system. But that may be more than we were expecting, if another problem happens that prevents it from being isolated in the first place

Chris Sakr: On this episode, Power Pools own Jessica Zamonis and I are looking at mixing relay technologies in directional comparison blocking schemes published July 10th, 2020. Relays have a lot to do in very little time. Some are older, smart, but can't see the big picture. Others are newer, smarter, and can see the big picture at the cost of a little time in scenarios where 30 milliseconds is the difference between catastrophe and stability. But success is only as good as how quickly they can work together. So get ready for either side of an adventure.

This Lesson Learned doesn't focus on a single event, but on general findings associated with the specific issue. The problem statement is atypically long and diving into the details will bring it all together, so here's a shortened version. Multiple composite protection system miss operations have occurred on the bulk electric system, as a result of mixing protective relay technologies at the remote terminals of directional comparison blocking schemes. One of the most challenging mix of technologies is utilizing a relay system based on newer microprocessors at one terminal, an older electromechanical relay system at the opposite term. So relays detect faults, identify their locations and if need be notify circuit breakers to trip, and somewhere along the line, somebody realized if these relays could communicate with each other, they could trade accurate information and address problems faster. This is referred to as a calm aided protection scheme, and a fairly common one is a directional comparison blocking scheme.

Jessica Zamonis: With electromechanical relays, they say, I see a fault. They tell the other set the relay on the other side, don't do anything until I figure out if it's in front of me, meaning between us, or if it's behind me, meaning it's somebody else's problem. In theory, both ends say that, and they send a block signal to each other to say, just wait. Now if that relay says, okay, now I know where this fault is and it's in front of me. It will release that blocking signal. And as long as the relay on the other side also sees it in the forward direction. It will release its blocking signal, and both relays will trip instantaneously once that happens so that both of them know it's in front, both of them kind of were told to quit waiting to trip by the other one. And so we can trip really quickly knowing that's where the fault is, and it is the requirement for those two relays to protect that part of the system.

Chris Sakr: Picture a transmission line with a relay on either side, when a fault occurs, both sides tell each other, hang on while they figure out where it's at. If they determine it's between them, they release a blocking signal, allowing the breaker to trip. If it's not, they don't. But problems come into play when the older electromechanical relays are trying to collaborate with newer microprocessor based relays.

Jessica Zamonis: Electromechanical relays actually start detecting that there is a fault faster. The microprocessor based relay, being a computer it's more complex. There are more things it has to do, and when I say it takes longer, I mean, we're looking at an electromechanical relay that can start detecting a fault within four milliseconds of the fault occurring. Now it doesn't know where it is, it doesn't have a lot of information. It just sees that there's a problem. The microprocessor based relay is going to detect that fault in about 17 milliseconds, four times as long, still really really fast. With the electromechanical relay, it sits there and it says, okay, I see that there's a fault, and it tells the other end to wait. About 17 milliseconds goes by conveniently about the same amount of time., it then can figure out, okay, I know the direction. I can either keep telling you to block because the direction is wrong, or I can lift that block and allow you to trip. Now with the microprocessor schemes, somebody, probably an engineer thought, Hey, I've got this really cool idea. When we do these microprocessor based relays, not only can we detect the fault, but we can probably make it figure out the direction all at the same time. It's a little more complex that's going to take us a little bit of time, but we can do all this other stuff at the time. As soon as the microprocessor based relay detects the default, it knows is it in front of me or behind me.

Chris Sakr: In Last Crusade terms, Marcus is an electromechanical relay and Indy is microprocessor based. Marcus is intellectually capable, but lacks the physicality big picture experience and ability to process all the variables that Indy has. And even with all Indy's expertise, this was still the 1940s. He couldn't like punch out a Nazi soldier, steal a smartphone, and send a WhatsApp message to Marcus thousands of miles away. Any advantageous info couldn't get where it needed to in time, and Marcus still got captured. So we've got a timing issue, the right data, just too late. Let's drive the timing problem between relays home a little more. The Lesson Learned document includes some figures and Jess is going to walk us through figure three. You can either check it out later or follow along as you listen. Figure three illustrates the worst case scenario.

Jessica Zamonis: The electromechanical relay is actually going to detect the fault first and send that information over to the other relay. The microprocessor base relay in this case to say, don't trip yet, I need to figure out is the fault between us or is it behind me? In the meantime, the microprocessor base really hasn't even figured out that there's a fault. 17 milliseconds goes by and now the microprocessor relay has figured out, Oh, I know what that guy is talking about now, there's a fault. The problem really comes in is if the fault is behind the microprocessor based relay, but still in front of that electromechanical relay. So it's, maybe just behind it. You picture kind of two people standing there looking at each other. Do I see something between us or is it behind the guy I'm looking at? If it's behind the guy you're looking at, well then I don't want to tell my breaker to trip because that's going to be somebody else's problem. The electromechanical relay, sees the fault and starts sending the blocking signal to the microprocessor based relay. It's still spinning, it's still trying to figure out what's going on, but it gets to the 17 millisecond time. It hasn't gotten a blocking signal yet from the microprocessor based relay. So it says, well, I guess if I'm not being told to block, I'm going to trip. And that's when microprocessor based relays detect the fault and determine the location of the fault. So it now knows, okay, I see a fault, but it's behind me. I should tell the other side of the line to block. Oops, too late, the electromechanical relays already tripped because it didn't wait long enough to get that blocking signal from the other side.

Chris Sakr: Because of the timing error, wires get crossed, and there's a miss operation that unnecessarily drops the line. Now what we want and seemingly not within an operator's control.

Jessica Zamonis: First thing to kind of help mitigate this problem is you have to be aware of it. Maybe you can find a way to slow down your electromechanical relay, but then you're running the risk of, slower action for faults on that line that it should be protecting. What I would say is there're other types of commuted protection schemes that you probably could investigate using. Based on kind of what your company does, what your standards are, or worst case scenario you look at, is this over tripping negatively impacting the system. And if it is, well, maybe that's enough of a reason that you can go back to your accountants and say, hey look it's broke. We need to fix it. Because really trying to get both sides on the same technology is going to be better and provide more information for the operators, more information for system protection and just better reliability overall.

Chris Sakr: If your protection people can make a good reliability case to accounting, they may be able to head miss operations like this off. But as we all know, accountants aren't always eager to shell out extra dough. Another mitigation technique, get the relays talking at the same speed, which requires some compromises.

Jessica Zamonis: You've got to either add a time delay on the electromechanical relay for it to figure out what's going on or find a way for your microprocessor based relay, which some of them can do, to actually start sending that blocking signal before it has everything figured out. You've got to get one of the relays to think like the other, almost. And it's a challenge one way or the other. And you know, you don't necessarily want to slow down a microprocessor based relay just to accommodate this because you would really need to work with your transmission planners to figure out if slowing that tripping down becomes a bigger issue on the system. Because, it might be some places in certain systems need high speed tripping. And if that's one of those, a couple of milliseconds may make the difference between a fault clearing just fine and fault propagating through the system and maybe causing some nearby generators to trip. And that's going to just be a worst day for everybody involved

Chris Sakr: It's not the best solution, but in some cases it's a perfectly workable one until updated microprocessor relays are in the budget. From an operator's perspective though, the most important takeaway is to know which relays you have and where. Not necessarily serial numbers, but definitely types.

Jessica Zamonis: And a lot of times, the way you know that is based on the other information that you have for that circuit breaker. If you have fault distance and location on a particular breaker, you probably have a microprocessor based relay. Just the more information you have, maybe you have three phase amps, probably not going to have all that information where you have an electromechanical relay. General rule of thumb, not saying it's guaranteed to be correct a hundred percent of the time, but if you have one end of a line that has a whole bunch of information and one that doesn't have a whole bunch of information, you might have mixed relay types on that line. And then just being aware of where the faults are on your system. If you see that line and a line next to a trip, and they go out and they patrol and they find that that other line had the issue, the one that would be sharing that same terminal with the microprocessor based relay, you can probably help not only interpret what's going on, by saying, okay, well that might be what they were talking about in this Lesson Learned. This probably was a miss operation. Do whatever paperwork you need to. There are NERC standards out there that require system operators to identify and report potential miss operations so that they can be investigated by the protection engineers. You would see this, and honestly you'd probably think, well that's weird that they both tripped anyway. Hopefully you would report it as a potential miss operation. But maybe if through that process, you can also say, is this an issue with the fact that maybe we have electromechanical relays on one side and microprocessor based relays on the other, especially if you know that you're using a DCB scheme just in your system in general, or more specifically on that line.

Chris Sakr: By the end of Last Crusade, spoiler alert, the team pulls through everybody who needs to comes out alive. The Grail's in the right hands. And Marcus feels like a newly minted rip roaring adventure.

Marcus Brody: "Henry follow me! I know the way! Yah!”

Henry Jones: “Got lost in his own museum, huh?"

Indiana Jones: “Uh-huh.”

Chris Sakr: It's safe to assume Indy never took him on another high stakes quest. But on this one, everyone got to a point where they were able to work around one another's blind spots and leverage each other's expertise. Remembering the possibility of these miss operations within DCB schemes, knowing your systems equipment, and pointing out the correct problem when it occurs, could help push your system in a better functioning direction and prevent a cup that provides eternal life from making its way to Hitler, or at least prevent some unnecessary faults. Look, your job has plenty more adventurous to tackle, leveraging both new cutting edge technology and very smart, very old, very well-intended equipment. One thing that's great about Indiana Jones as a hero is that he's mostly just a very motivated nerd with a high pain tolerance. He reminds us that being a hero is never about having everything you need when you need it. It's about understanding what you're up against, knowing what you're working with and having a few tricks, your sleeve.