Node Operator On-boarding and Node Operator Health in General

This may come off as a gripe post. That is not its intention but we (VeriHash) want to call out a few things for discussion that we have been seeing as a genesis operator for Stakewise.

  1. Per this document – here Onboarding Process - StakeWise – a node operator is expected to join testnet, then be voted an allocation for gnosis, and then be voted an allocation for mainnet.

    Both Finoa and T-Systems were not allocated gnosis before mainnet and yet the DAO approved their inclusion. Finoa only recently activated and began staking their 5K allocation of GNO, which was still done after mainnet.

    If the DAO wishes to change these expectations then let’s change the documentation and change the expectation but what is the incentive for those staking GNO now to continue to do so when new operations can skirt the SIGNIFICANT expense of staking a loose leader (GNO)?

  2. When VeriHash was brought on as a node operator we were asked if we wished to participate in the node operator selection committee. This committee would review node operator applications and interviews to ensure we are meeting and raising the bar and bring to the DAO the highest grade of operators to vote on.

    Both Finoa and T-Systems are of the highest grade and we are super happy to see them included. But again there is no committee or extra review of node operators that VeriHash is aware of. We have never been invited to a node operator selection committee nor have we been included in interviews for new operators. These have seemingly been done unilaterally so far and we believe this is unhealthy.

  3. At present when a new operator comes online they are allocated ALL GNO and ETH stakes until the new operator “catches up” to the other.

    For Example: Finoa at present has 69 eth stakes. T-Systems will start with 0 and get ALL stakes coming into the pool until it reaches 69. At which point in time Finoa and T-Systems will split ALL stakes coming into the pool until they reach VeriHash and CrpytoManufaktur at 191. Given current deposit rates VeriHash and Crypto won’t see new eth for a VERY LONG TIME.

    Now add a new operator coming online after T-Systems, maybe two, and it doesn’t take much for the genesis operators to assume they will never see another staked eth from Stakewise. This seems damaging and is different from say Lido which simply round-robins through all the operators each stake holding no favoritism. The trick is they have the deposits to bring on new operators…

  4. I know this game is tough and I know it is hard to grow TVL but in our view Stakewise has been super bad at this. Thankfully SWAT is JUST actually a thing, but that has been quarters promised and just now starting to maybe form.

    Much more focus needs to be put put growth and increasing stakes. It was a tragic error to see stakewise effectively doing nothing pre-merge and it seems we are still waiting for news from the team as to what the plan is.

    Given how execution has been on some initiatives here, I don’t hold high hopes on seeing a focus on growth becoming a thing. But I am happy to be wrong. Regardless, StakeWise doesn’t have the growth to support more operators when over half the fleet is likely taking significant losses.

    The biggest threat to distributed operations of the pool is stakewise operating themselves. If distribution is important then Stakewise itself should stop taking new allocations completely like they have been doing so up to this point.

    Right now stakewise operators 1,268 validators vs the 451 across ALL other node operators. In round numbers stakewise still have 2/3rds of the pool. They should cease all future allocations in our opinion.

    Until deposits pick up to support a healthy node operator fleet, we should STOP taking on new operators as we don’t have the deposits to support them unless the DAO is OK using node operators capable of sustaining extended losses for extended periods of time just to stake for them; which means only larger outfits that may increase centralization pressure.

  5. I am here to officially call out the fact that GNO was a mistake. It didn’t meet its target to merge first (still no new so far when), it eats money in regards to infra to support it, and once it does merge it won’t matter because the returns vs cost don’t add up.

    Right now a node operator makes $0.63 cents (USD) per key per YEAR at $127 per GNO. At 10K keys that is ~ $6,280 per year. Referencing the teams own cost calculations for GCP and AWS that is less than HALF of the cost it takes to run the cluster excluding all other expenses and even paying yourself.

    If you take the Hybrid approach (validators in the cloud EL/CL on hetzner) you might break even with GCP validators and TWO EL/CL stakes in hetzner per year, you will still be 4K a year in the hole if you have AWS validators.

    These numbers are based on stakewise’s own cost calculations – Onboarding Process - StakeWise . And let’s keep in mind now that NO operator should be using hetzner since they made it clear they don’t want crypto on their infra. Running these is a risk vs reward trade off that an operator working for the DAO is making for the DAO and they should be clear about that.

    There is OVH as well, which is slightly more expensive than hetzner. But the things to keep in mind with OVH are that we (operators) all can’t run out of OVH if we the DAO cares about infra/provider diversity. That diversity comes at a cost, we verihash continue to try to bring costs down as tight as we can, but we refuse to operate in the same providers and regions as other operators for stakewise because we care about infra diversity.

    If that kills us I am fine with that because I am not interested in cutting corners. But GNO won’t likely EVER be worth operating due to how it scales (it doesn’t at the infra level in our opinion) and my only choice is the hope ETH deposits/price pick up to cover this goodwill donation or it kills me even if I run it in the dirtiest/dangerous way i can to cut costs (which doesn’t benefit the DAO it hurts them)

    Double the staking fee wont help that much really. First because I think it is unfair to staker to ask them to pay more for what is measurable crappy service right now. That is a sure fire sign of a failed business in the normal world. Secondly, because while it does reduce the loss it doesn’t make it break even so the threat to my solvency as a node operator doesn’t go away or really markedly get better.

    If the DAO wants to make GNO viable with enough margins to support diversity at the operator/infra level, stakers are going to have to pay ALOT more in my estimations. And I am not saying that because I am greedy i am saying that because I care about meeting the goals of diversity for GNO and ETH and in doing so will ensure I exit this year with a 5 digit LOSS on my business balance sheet, some of which is due to price crashing MOST of which is due to GNO staking costs.

    Or simply put an end to the experiment known as GNO.

  6. In addition to everything else above the cost of moving rGNO and rETH to GNO/ETH is high and results in even deeper losses. Especially with rGNO it is so bad at the moment that we consider rGNO worthless and not moveable due to the losses we will take converting it to sGNO and finally into GNO.

    The losses on the rETH side aren’t as bad but still present losses in a tough environment. It may be wise for the DAO to consider some sort of pool that allows their operators to get out of rETH easier than standard stakers so we can convert to fiat and pay our bills without taking unneeded losses with double conversions.

  7. Given the above issues with GNO I can understand how any new operator coming in to the fold would want to avoid an immediately loss leading venture just to get to staking eth, but we need GNO stakers the pool still isn’t 100% staked and is missing APR but nearly 4% and have been for some time. We either all must shoulder the burden and figure out how to make the GNO raft float, or we need to let it go and figure out how to exit.

    It causes way more damage to everyone right now as it is today.

    The addition of a new node operator is always awesome and we welcome it. However, is it how the last two have been added that we are calling out here. We either need to stick to our rules or change them to suit us and we need to give very very serious consideration to the stability of the fleet.

    Node operators are not intended to get rich but at present many of them are likely operating at a loss. I know we are and while yes crypto winter is taking its toll, the lack of deposit growth and the millstone that is GNO are huge contributors to this.

    So much so that VeriHash will likely cease to exist and thus be forced to exit validators by the end of March 2023 if we don’t find a way to offset GNO costs in the face of stalled ETH growth. It is just a fact of life, but VeriHash is not even a year old. We can’t simply loss lead our way to victory.

    Our pockets aren’t that deep to donate to GNO at the level it requires today. We aren’t viewing this as a DAO problem either specifically. No one forced us to take on as much GNO as we did, we likely should have stopped at 5K and even then this conversation would still be happening more than likely. We just wanted to see the GNO staked so we could all feel that 14% we all deserve and that isn’t happening and now killing us, slowly and painfully.

For the DAO we ask that you consider the points above, that you give some very very focused thought around how you want to manage your node operators and the relationships with them and how important certain aspects like infra diversity, operator size vs centralization pressures, etc.

At the very least, stick to a set of rules and play by them transparently and openly.

5 Likes

Wow. There is a lot to digest in this post. I wasn’t aware of any of these frictions, and I am concerned about seeing them.

Two things jump to mind immediately:

1.) The distribution of keys and deposits across node operators appears to be unfair. If we keep onboarding more and more operators, the current mechanism apparently fails to provide sufficient inflows to older operators. By the time the newest operator is “filled up” and on par with the rest, we’ll have the next new operator, and so on. If this is how it works, we need to change it. I would support a round-robin approach.

2.) I may be outing myself as an ignorant know-nothing, but I never really understood GNO and why we need to offer GNO staking. If it turns out that operating GNO infrastructure can’t be achieved with a profit, or at least break even, that would be economic insanity and should be stopped.

As a third point I think we need to make sure that our standards and procedures for onboarding new operators are followed.

I hope more DAO members will weigh in on this discussion, because at least one of our node operators is struggling - hard - with dire consequences on the horizon (having to shut down in March 2023). The well being of our node operators is in all our best interest and we can’t allow that they operate at a loss.

I would also love to hear the team’s input on this and maybe share how they see things, which would help us understand the situation better and, more importantly, from different angles.

Let’s fix this.

7 Likes

I am going to link this comment here as well.

I disagree with the categorization, at least in terms of our experience/expections coming in as a gensis operator. We were told to take gnosis first before ETH, and were judged on that performance to be granted ETH (along with our testnet activity which was extended as well).

Given the above it means that GNO is a true bonus/not required, and given its econimic realities, I find myself wonderng, as an operator, if I want this “bonus”.

Secondly, I am going to call out a few other numbers to consider here.

Right now stakewise has 41,741 GNO staked in the pool. Not all of that GNO is actively staked on validators that number is 34950 actively staked (From app webpage).

There are 106503 active validators on GNOSIS meaning right now stakewise represents:

Active Validators:
34950 / 106503 ~ 33% of the staked GNO

Total Validators (if all the GNO was actively working)
41741 / 106503 = 39% of staked GNO

Each of the node operators with 9K+ validators (stakewise, cryptomanufaktur, and VeriHash) represent 8 -9 % of the total staked GNO. Finoa at present 4 - 5%. If T-systems eventually gets added we will likely hit the 39% of the staked GNO total.

My question is does the pledge we made about not growing beyond a certain percentage of staked ETH apply to chains like Gnosis? If so, we need to need figure that out as we are over that now.

Secondly, we should re-engage the the GNO team and see if there is some support we can get from them. I know the stakewise team reached out to to them about grants for supporting commercial operators. With over 30% of the GNO network capacity in total with 33% active, even just one operator ceasing to operations will have an impact especailly if its one the of 8 - 9% ones.

Regardless, if GNO was truly bonus at the start I am not sure we can treat it as bonus now when there is over 30% of the network capacity sitting in our pool some of it still in-active.

And looking at these charts, one can see a clear trend. – Charts - Open Source Gnosis (GNO) Mainnet Explorer - beaconcha.in - 2024

As the network and validators have increased network performance is slipping significantly with effectiveness becoming really bad. This likely also explains why on average the GNO network itself looses about 1 GNO per hour across all the validators in terms of penalties.

This also explains why extracting closer to that 14% APR vs the 10.45% we are at now is going to be boarderline impossible if network metrics contrinue in this direction.

Missed blocks is now over 20% which in turn impacts other validators on the network. Granted some of this is likely due the fact that it is not profitable to run GNO at scale so I am sure some have just turned their validators off and given the price of GNO this is a low risk/low loss move. But there is another story here as well.

There are real software limitations that GNO is starting to run into now. With half the slot time of ETH it is going to hit a wall when the validtor population gets high enough.

This is already the case with sync committees and the reason why it is so hard to be consistently good with them. 5 seconds isn’t enough time for most validators to get the work done and the message to the BN in time before a slot is missed; because most clients are not designed for a 5 second slot time to start with and we are seeing the effects of that on the network now.

Also block production is impacted with 20% of the blocks on GNO arriving late and thus a miss. Again, you can’t just “turn up the boost” and expect your stock honda civic engine to hold together; this isn’t Fast and Furious. But that is sort of what is happening here, we are taking a client, halving the slot time and hoping the designs and architectures of those clients and the network (along with its assumptions around propogation time and other stuff) hold true.

At the start of GNO it looks like it was holding but as validator population increased that extra “boost” is starting to show its impacts.

All this does not bode well for the GNO network IMHO. It is a sign of a network buckling under pressure and getting worse.

As much as I like Gnosis chain, I’m somewhat shocked at how expensive GNO staking seems to be for operators and how unsustainable it is in general.

The design choices around the GBC and GC in general seem problematic and unsustainable. Has Gnosis been told about these issues before?

It would be awesome of GNO if it would back off to 7 - 8 second slot times. That 2 - 3 extra seconds would make a world of difference and is actually more likely in-line with the future of ETH if one follows the research.

There is a very high chance that ETH slot times at some point in the future move from 12 seconds to 8 seconds. So GNO moving from 5 seconds to 8 seconds would potentially bring it in-line with the future of ETH if it takes that direction.

Plus 8 seconds is only a 33% speed up vs 59% speed up which is a much less pressure on the client and network architectures.

I have asked this same question on the GBC telegram. I know of no other way to interact with the GBC community since I don’t participate in any of the chains offered services and only stake GNO due to Stakewise.

Have you tried the Gnosis Forum at Gnosis? This is where the Gnosis Improvement Proposals (GIPs) start.

I am happy to have discussions there as well around my concerns with its over-stretch of the beacon architecture when I can spare cycles.

To be fair though, any sort of change in gnosis at the slot level is going to take a long time and sort of tangental to this discussion here.

GNO is not break-even on infra costs right now and wont be till it reaches a much higher price-point at the current staking fee/fee split. The DAO could decide to say double the staking fee to 20% with 15% going back to the operators and 5% going to to the DAO (these are just knee jerk numbers but its not hard to calculate given the numbers I provided previously in regards to earning per key per year).

I also want to call out here as well that there are OTHER costs beyond infra. Staking for the DAO must be able to pay for infra and pay the people to run infra. If it doesn’t your incetiviation to do a good job isn’t there and those of us who are altruistic now wont be over time and instead will find a way to exit to save themselves when their pockets run dry and they can’t continue to donate to the cause any more.

“But there is upside on price recovery/crypto is just down right now”. This is true but POS makes it impossible for operators to just “turn off their mining rig” and wait for the price to improve.

I speak from a node operators prospective here obvsiouly, but in my mind node operators likely need to be one of the most de-risked participants that make up a staking pools like stakewise. FInding the cheapest operators with the highest performance is the goal, but they also must be assured of a faily stable income stream to ensure their performance for stakers.

You likely don’t want your active validators, participation rate, on-chain performance to whip saw with the price of the staked asset. “Pay as little as possible for security” is true, but that doesn’t mean focusing purely on infra, hardware, or electricity costs; this is still very much a human endeavor and that needs to be appropiately figured for as well as the rest of the stuff; otherwise the economic incentives built into any chain don’t matter at the end of the day.

Alright, there’s certainly a lot to unpack here. I’ll be brief to make sure readers don’t lose the plot, and that my response advances the discussion.

I will focus on the 3 issues at hand: i) viability of running nodes for Gnosis Chain (GC), ii) Verihash’s role as a node operator on GC, and iii) the topic of StakeWise’s growth / distribution of validators on Ethereum.

Viability of Gnosis Chain / running nodes there

Let’s start with the discussion of Gnosis Chain and running nodes for it. It is true that GC does not offer the same profitability as Ethereum for commercial node operators. This comes down to requiring just 1 GNO to launch a validator. With the requirement set this low, the incremental revenue from running a GNO validator is below the incremental cost for a range of GNO prices, which naturally results in losses in a bear market. This is the likely reason why participation in the network has been low recently, as nodes are turned off due to being uneconomical.

Why, then, the StakeWise team has made the decision to launch a liquid staking service on Gnosis Chain? For us, the decision to launch on GC was relatively simple: we were (and still are) bullish on the GNO token because of the products developed under the Gnosis DAO leadership, including Gnosis Chain, which still has the opportunity to be the canary network for Ethereum. There are challenges to this thesis - including the delay of the Merge, for example. However, this is not supposed to be simple and in our opinion should not be construed as a signal of failure for the network. I have not seen Ethereum’s faithful write off Ethereum when the Merge deadline kept being pushed out due to the challenge it represents. Why should we suddenly give up on GC then?

So in my opinion, GC failing to Merge so far shouldn’t lead to us dropping the network - instead, we could probably do more to help the Gnosis team complete the process. The same applies to pushing the GC ecosystem into the DeFi landscape as a testing ground, i.e. making sure its adoption rises based on its purpose. It has barely been 6 months since we launched on GC - let’s have a little faith.

That being said, the burden on the node operators is going to stay regardless of the chain’s adoption, for as long as GNO price remains below the b/e point. While anybody running nodes for the network should have been prepared for operating in the current market environment before they decided to run nodes, if it becomes a matter of life or death for the business, someone should step in and help if they can. The StakeWise team has discussed the various alternatives available in this situation with the team at Gnosis, and will present a few alternatives for public discussion shortly.

Finally, on the share of the GC network currently controlled by StakeWise and plans to self-limit. The reason the team has stopped short of adding more nodes to allocate the idle GNO is precisely the fact that StakeWise DAO already controls 33% of the staked GNO, and growing above that percentage risks threatening network’s liveness. Sadly, this does result in a subdued APR for the end-customers. While some seem to consider this “crappy” service, we believe it’s the best alternative among the available options.

Verihash on GC & involvement with StakeWise

Verihash has been a genesis operator in StakeWise Metro and together with CryptoManufaktur was the first to start receiving ETH & GNO allocations. Perhaps contrary to the statement made in the post, Verihash was not required to join GC to start receiving ETH allocations - both the documentation (GC being treated as a bonus) and the internal communication records show that the StakeWise team has offered (not demanded) Verihash to run nodes on GC, and the Verihash team promptly proceeded to prep for accepting delegations. It also cited internal calculations in that communication that demonstrate that some amount of due diligence went into the decision. Hence, I personally struggle to see the issue with other node operators perceiving GC as an option, not a requirement (and still deciding to join, e.g. Finoa).

When it comes to Verihash experiencing financial difficulties due to supporting GC and not receiving sufficient ETH delegations due to additional node operators joining StakeWise DAO on Ethereum, the StakeWise team will communicate directly with Verihash about the potential support for resolving the matter. We’re not a central authority or a regulator to be in the position to offer a rescue package, but we just want to ensure we help where we can.

Finally, on the topic of participating in the Validator Committee: there have been plenty of discussion before the roll out of StakeWise Metro about the various options for achieving our goals. But the design decisions that were ultimately made may be different to what has been discussed with various parties. The StakeWise team has discussed the possibility of Verihash being represented in a Committee that assesses the applications made by the node operators to join the DAO, along with other potential members of the genesis operator set. However, this idea was later dropped due to the possibility that genesis node operators act against the inclusion of others to receive more delegation themselves. StakeWise team takes the responsibility for not clearly communicating as much to the Verihash team; however, we also note that there has been no opposition voiced to the final Validator Committee arrangement when it was introduced, as evidenced by the comments in this thread: Introduce Validator Committee to the StakeWise DAO - #3 by bgraham-vh.

Growth of StakeWise & Ether allocations

A lot in this thread has been dedicated to criticizing the StakeWise team for the lack of growth initiatives & failure to attract more deposits. While the SWAT criticism is appropriately placed (and the blame is solely with me), the rest of the rhetoric ignores the important facts: the StakeWise team is utilizing every possible measure at its disposal to return to consistent growth. Some examples:

Unfortunately, some of these initiatives will never materialize (e.g. Tribe DAO shutting doors), some take time (e.g. getting Nexus over the line), and some are in development (e.g. V3 and dsETH). These facts cannot be ignored if our goal is to find the root of the problem ie the challenge with growing TVL. Just like Verihash, StakeWise DAO is facing a bear market in which DAOs collapse, LSDs lose peg (which leads to a drop in new Beacon Chain deposits), and activity on the chain idles out. This is a risk like any other, and dealing with such risks successfully requires strong partners, which StakeWise aims to be for the organizations and individuals we work with. Verihash is no exception.

To address other points:

  • In response to the criticism of the allocation of new Ether to the node operators, the StakeWise team will review the alternatives to the existing system with the goal of helping the node operators improve their financial position. Stay tuned for the updates on this topic.
  • In response to the absence of rGNO liquidity, the team will discuss with Gnosis about the possible options to improve liquidity. The team believes that rETH2 liquidity is pretty deep and doesn’t see the merit to an argument that rETH2 conversions incur substantial losses. Still, we’d be happy to listen to the Verihash team’s ideas for how their situation can be improved.
  • StakeWise Labs is happy to commit to not accepting any delegations (just as it has before) until the amounts of ETH staked are spread evenly across the node operator set.

I hope I have not missed anything but if I have, I hope someone points me at it so I can address it.

Personal note: it saddens me to see @bgraham-vh 's frustration expressed in such crude form as was seen here, and I’ll avoid dramatizing things further. My only comment is that while we are indeed a DAO where public discourse should touch both the positives and the negatives of the service, I believe the nature of async public discussion requires us to minimize fluff and get to the point. I also believe the nature of the relationship between node operators and the DAO is first and foremost a business relationship, and as such communication must be conducted in a professional manner. We can discuss the issues at hand without emotion here, and I firmly believe it will help us get to the root of the problem & solve it faster this way. I hope this approach resonates with others.

3 Likes

I am glad to see your response and I respect the team’s and your take on this matter. I disagree with most of it, and disagree that any of it was crude but that is sort of not the point I don’t think; I am simply glad to see the engagement and we will take stock of this response and formulate our next course of action.

EDIT:
I will mention one thing here.

We started this topic and made the majority of these posts, especially the first post, pre-V3 announcement.

V3 changes almost everything. In fact I almost came to this thread immediately after the announcement call and was going to post an update saying something along the lines of

“Taking a step back here to measure up the V3 announcement and our concerns here”.

That didn’t happen before further responses to the thread unfortunately.

I think what Graham is referring to here is that the rGNO/SGNO Curve pool is unbalanced such that there is 17% slippage to convert even tiny amounts of rGNO to sGNO. Are there other pools that I am not aware of?

I think what Graham is referring to here is that the rGNO/SGNO Curve pool is unbalanced such that there is 17% slippage to convert even tiny amounts of rGNO to sGNO. Are there other pools that I am not aware of?

That’s right and I accepted that. My comment is with regards to rETH2 :slight_smile:

Why do we do this?

It seems pretty normal that an older operator would be running more nodes, and also fair to those who joined earlier. Secondly not all operators are alike, I am not sure the division should also be that each get 1/n of the nodes (n=number of operators), but other factors should play, like processes, locations (we also need geographic spread), MEV returns (if we allow operators to compete).