Node desynchronizing sometimes

Hello,

Since I started my node with full successful synchronization 5 days ago, I already had 2 desynchronizations (fork), forcing me to resynchronize it completely again (docker-compose down -v; docker-compose up -d).

The second desynchronization happened today, about 2 hours ago, and this time I pulled the latest sandbox image to see if it works better with it (docker pull universalnamesystem/core:sandbox, before the down / up).

Sadly I didn’t look in the log the lines at the moments it happened. If it happens again, I’ll try to look at it next time.

We are investigating. If you have some logs, don’t hesitate to give them to us!

Thank you for your return. :slight_smile:

Can you give more information about your configuration?
Hosting, VPS resources, Docker and Docker compose version, logs when desynchronization happens…
Thank you

Hosting: Self hosting
Internet connection: FTTH 1 Gbits down / 400 Mbits up
Hardware: Intel NUC Core i5, 4GB of RAM
OS: Ubuntu Linux 19.10
Docker: 19.03.3
Docker-Compose: 1.24.1

Je viens de nouveau d’avoir une désynchronisation de mon nœud ce matin, il y a environ 45 minutes.

Le log complet est disponible à cette adresse : https://www.cloud-libre.eu/index.php/s/7J7qeHMmFGwDH7H.

Quelques traces d’erreur intéressantes:

^[[36muns-sandbox |^[[0m [2020-02-13 09:39:45.486] ^[[31mERROR^[[39m: ^[[36mCould not download blocks [121404, ..] from any of 9 peers. Last attempt returned 0 blocks from peer 35.206.105.234:4002.^[[39m^M
^[[36muns-sandbox |^[[0m [2020-02-13 09:39:45.487] ^[[32mINFO ^[[39m: ^[[36mCould not download any blocks from any peer from height 121404^[[39m^M
^[[36muns-sandbox |^[[0m [2020-02-13 09:58:56.214] ^[[34mDEBUG^[[39m: ^[[36mevent 'NOTSYNCED': {"syncWithNetwork":"syncing"} -> {"syncWithNetwork":"downloadBlocks"} -> actions: [downloadBlocks]^[[39m^M
^[[36muns-sandbox |^[[0m [2020-02-13 09:58:56.215] ^[[34mDEBUG^[[39m: ^[[36mDownloading blocks in chunks: [121511, ..]^[[39m^M
^[[36muns-sandbox |^[[0m [2020-02-13 09:58:57.575] ^[[31mERROR^[[39m: ^[[36mCould not download blocks [121511, ..] from any of 9 peers. Last attempt returned 0 blocks from peer 51.158.99.6:4002.^[[39m^M
^[[36muns-sandbox |^[[0m [2020-02-13 09:58:57.583] ^[[32mINFO ^[[39m: ^[[36mCould not download any blocks from any peer from height 121511^[[39m^M
^[[36muns-sandbox |^[[0m [2020-02-13 10:02:01.491] ^[[32mINFO ^[[39m: ^[[36mCould not download any blocks from any peer from height 121528^[[39m^M
^[[36muns-sandbox |^[[0m [2020-02-13 10:13:17.070] ^[[34mDEBUG^[[39m: ^[[36mevent 'NOTSYNCED': {"syncWithNetwork":"syncing"} -> {"syncWithNetwork":"downloadBlocks"} -> actions: [downloadBlocks]^[[39m^M
^[[36muns-sandbox |^[[0m [2020-02-13 10:13:17.070] ^[[31mERROR^[[39m: ^[[36mCould not download blocks: We have 10 peers but all of them are on a different chain than us^[[39m^M
^[[36muns-sandbox |^[[0m [2020-02-13 10:13:17.070] ^[[32mINFO ^[[39m: ^[[36mCould not download any blocks from any peer from height 121569^[[39m^M

It’s hard to understand what happened on your node.
Nodes are “desynchronised” several times a day (for a short time) and are resynchronised a few minutes later, automatically.

Around 2020-02-12 14:07:10.541 UTC, your node forged the block 114 721, but it broadcasted it to the “chain” (all other nodes) too slowly.
At the same time, we can indeed see on another node (UTC time):

[2020-02-12 14:07:21.325] DEBUG: Delegate 38025c9fce2ad334d0aa39e7ee6ca4dc166143910533b6dabf187b6a056e90c9 (020f472925e7b07d7dc67ae146cc5342272313918cbd2da92cbf57c95f4a66a85c) just missed a block.

Delegate 38025c9fce2ad334d0aa39e7ee6ca4dc166143910533b6dabf187b6a056e90c9 is the one running on your node.

Then, it was living on it’s own chain.

At 2020-02-12 16:02:41.442 UTC, it finally discovered it was alone on its own chain and decided to start the fork recovery process ([2020-02-12 16:02:41.442] INFO : Starting fork recovery). It succeeded.

Then the node synchronized and was SYNC at 2020-02-12 16:02:46 UTC.
It was back to the chain with the others.

Questions:

  • do you have other processes running on this VPS?
  • can you monitor the CPU/memory load on your VPS? (systat?)

Yes I have other processes running on this machine:

  • NGINX server with PHP-FPM running about 5 websites (including a Nextcloud)
  • MySQL server
  • Duniter blockchain forger node (PdPOW based (Personalized difficulty Proof-of-Work))
  • Cesium+ pod (ElasticSearch)
  • ĞChange pod (ElasticSearch)

Mainly. Note that the Duniter node while using a derivate of PoW (which requires more resources) never unsynchronize even with all the other processes running (so including UNS node) :wink:

I note that the extracts of log you mentionned are a lot earlier (on 12th of february afternoon) than the fork I observed which occured on February 13th in the morning and the case you mentionneld resolved itself after about 2 to 3 hours.

So maybe I’m just not patient enough, and should wait longer for the automatic fork resolution, even though 2 to 3 hours is quite a lot, considering you said that normally a fork should resolve itself in a matter of minutes.

Also I observed in the delegates monitoring page that other nodes are desynchronizing for multiple hours, so it doesn’t seem to be an isolated issue with my node only.

  1. Explorer shows delegates who have missed their forging round. It can be for any reason: shut down node, desynchronization …

  2. Some nodes never do “long” desynchronization, some other are more “brittle”, despite the efforts of the human ops behind the delegate :wink:

The network is still growing as we are added more relay nodes, and improvements will be delivered with the next UNS Core version which embeds last Ark 2.6 version.

Thank you for your patience :innocent:

3 Likes