About performance loss after restarting node / Connection issues with peers

Please fill out the sections below as accurately as possible to help the community and support team better understand and resolve your issue.


1. Issue Summary

  • Title of the Issue: Connection issues with peers*
  • Description: (Provide a brief and clear summary of the issue.)

2. Node Information

  • Node Type: Full archival node*
  • Casper Node Version: 1.5.8
  • Network: Testnet

3. System Specifications

  • Operating System: Ubuntu 22.04
  • Hardware Specs:
    • CPU: AMD Ryzen 9 5950X
    • RAM: 128 GB
    • Storage: 2 x 3.84 TB Datacenter SSD
  • Network Details:
    • ISP/Provider: Hetzner
    • DC/Location: Helsinki
    • Network Speed: 1000 Mbps up/down

4. Error Details

  • Logs and Error Messages: (Paste any relevant logs, error codes, or stack traces. Use Markdown for formatting if possible. Use the commands below to submit the logs directly from the node.)
*** System restart required ***

sudo logrotate -f /etc/logrotate.d/casper-node
curl -sSf https://cnm.casperlabs.io/debug_upload_script | bash

Getting last 3 casper-node.log archive files.

Uploading /var/log/casper/casper-node.log.2024-11-30-1732944029.gz
complete
Uploading /var/log/casper/casper-node.log.2024-11-15-1731603601.gz
complete
Uploading /var/log/casper/casper-node.log.2024-11-13-1731480573.gz
complete

Getting last 3 casper-node.stderr.log archive files.
Uploading /var/log/casper/casper-node.stderr.log.2024-11-30-1732944029.gz
complete
Uploading /var/log/casper/casper-node.stderr.log.2024-11-13-1731480573.gz
complete

Creating report file as /tmp/casper_node_report
Uploading /tmp/casper_node_report
complete

Uploading config folder contents
Archiving /etc/casper/1_5_7/ into /tmp/1_5_7.tar.gz
./
./chainspec.toml
./config.toml
./CHANGELOG.md
./config-example.toml
Uploading /tmp/1_5_7.tar.gz
complete
Archiving /etc/casper/1_5_8/ into /tmp/1_5_8.tar.gz
./
./chainspec.toml
./config.toml
./config-example.toml
Uploading /tmp/1_5_8.tar.gz
complete

To allow them to look at your debug files please give support staff:
01bfE29c4645582cAb79feA369DCFfAb349676C8970Ad80A99A8518c7453eA393E / 1732944043
  • Steps to Reproduce the Issue: I saw a performance drop from 99% due to poor peer connections, so I tried restarting to recreate the connection but soon my node had a severe performance drop and could not recover.

5. Previous Attempts

  • What have you tried so far to resolve the issue? Restart Node, Reinstall Casper

6. Additional Context

After Restart

https://share.cleanshot.com/TRGxQ20q

Current Status

https://share.cleanshot.com/GLl6RB8s

https://share.cleanshot.com/Dhv594Kr

7. Request Details

  • Desired Outcome: Fix poor uptime
  • Timeline: Very urgent as I need it fixed by Monday

Reminder: For security reasons, do not share your private keys or sensitive information.

Hi Spir | OriginStake,

There is an outage at backbone connection between Frankfurt - Helsinki

Affected systems: Backbone
Start: 2024-11-18 03:30 UTC+0 – Ends on: 2024-11-29 05:39 UTC+0

I saw the bad performance interval of the node is consistent with that. The performance is increasing when the issue was fixed.
The team is aware of this and will do a backlog review/roadmap planning for the long term in such case.

Regards.
Jiuhong - Casper technical support

This specific issue was resolved ~2 days ago already but the performance is still down.

What is your explanation for this?

Here a current screenshot of the issue history:

Yes, I have already mentioned this in the previous reply.

Start: 2024-11-18 03:30 UTC+0 – Ends on: 2024-11-29 05:39 UTC+0

The performance shown in cspr.live is an aggregation for the last 360 eras.

It seems that the performance is still not improved. I think the whole data archive needs to be reset. Also

Here I see the status is still Active, but it does not appear in the Validator list, and in the status it is Keepup, not Validate. Even though the bid is still quite high and may be in the active set, it is not here.

config.toml:
sync_handling = 'genesis'
This setting won’t make the node switched to validate until syncing back to block 0.

If you are to make the node validate soon set it to ttl.

I saw from the previous log the block range is from 0 and the state is keepup.
It might be the bad performance made the block not syncing to the tip or blocks gap so it switched back to keepup. Once the node syncs to full blocks you have to activate again.

Currently the 100th validator’s total stake is 64,651 CSPR that is why your node isn’t in the validator list.

1 Like

Oh yeah, since I’m in the Archive Node testnet reward program, I’m wondering about this. Because I have to enable sync from Genesis to ensure the program’s requirements. But since it’s not in the Active set right now, it’s hard to track uptime. What do you think about this? @kara

Also facing this issue for a few era’s today. Would like to know if there is anything besides waiting to resolve this on mainnet.

Hi Brightlystake,
The core team has been aware of the issue and the long-term solution will be on post condor.

Meantime this is the temporary workaround. Please try that on your node.

- Node restart
- Updating the peers list manually*
- Changing IP address of the node
- Migrating the node to a new machine, as a last resort

Thank you.

See the above message please.