How to troubleshoot ignored yaml file?

Description

I think that the yaml file is being ignored, and I’d like to have some guidance on how to find issues with yaml files in codecov.

Currently what I do is:

  1. Use cat codecov.yml | curl --data-binary @- https://codecov.io/validate to validate my yaml file. It says it is valid.

  2. Check the Settings section of the repository, to check that parsed version is equal to yaml file. They are completely different:

I suspect that there is some issue with parsing that is not caught by the validation check. However, I have not be able to access any traceback from parsing.

My question is then kind of double, first it would be great if someone could help and point me what I am missing in order to fix this. Secondly, it would be eve better to get some guidance (or be pointed to any resource) on how to troubleshoot this kind of issues by myself. I am open to help on this once I know how to do it.

Repository

Here is a PR where CI has run ONLY ONCE, and eve though codecov.yml says to wait until 5 builds were finished, the bot commented and edited as new builds were finishing.

Additional Information

Extra checks I have done:

  1. make sure codecov.yml is in repo main directory
  2. make sure yaml file is named codecov.yml and not .codecov.yml

Hi @OriolAbril

The validate endpoint is the best bet, but it’s not currently updated (should be very soon) and as such can not be fully trusted.

In the case of your b1316f41e3cfd34fe66196a98c1ca11d4a2745e6 commit, the YAML has the following error: after_n_builds: 5 is not a valid key under comment, it only works under notify (and affects all types of notify, so no need to get more detailed)

If things are still wrong after this fix, please share a commit SHA and I’ll check the logs again.

1 Like

This seems to have fixed the issue with yaml file parsing, thanks!

I am not sure about what does “validate endpoint” means though, could you explain it or provide some link?

We now generally have another issue, I think it is related to Codecov status stuck at "waiting for status to be reported" on github (not completely sure though). I am still at loss as to how to troubleshoot though.

If you take extend make_ufunc and improve wrap_xarray_ufunc defaults by OriolAbril · Pull Request #1107 · arviz-devs/arviz · GitHub, Azure has uploaded coverage info from 5 builds (the 3 base tests and the 2 external tests) but only 3 seem to have been received, is this also a configuration issue?

Codecov page for latest commit in PR: https://codecov.io/gh/arviz-devs/arviz/commit/4c8e16e91c405091efe0b306f8c752387175657d/build

I’m referring to About the Codecov YAML

Let me check that commit and see what’s up.

Commit 4c8e16e91c405091efe0b306f8c752387175657d still has the after_n_builds under comment, did you provide the correct SHA?

I may have forgotten to rebase, but it is still puzzling, every time I understand less of what is happening.

  • I did the validation, and got Valid! with the after_n_builds in comment section. I have also seen it should be supported (not sure if already or in the near future: Release Notes for Codecov v4.4.9). Do you know if comment will default to value in notify or if both must be set for comment and check to wait?
  • commit 4c8e16e91c405091efe0b306f8c752387175657d may still have after_n_builds in codecov (due to forgotten rebase) but both comment and check are waiting! 3 builds have finished (according to codecov, 5 have actually finished), but no message whatsoever
  • How can builds finished hours ago still not appear in codecov? codecov page for commit 4c8e16e... still says " Notifications are pending CI completion. Waiting for GitHub’s status webhook to queue notifications."

Thanks for your patience.

Thank you for yours!

First, the validator is not 100% correct in it’s valid message, due to a tightening of the schema. It is a priority to correct and should be fixed soon.

The link you referenced is for our Enterprise solution which, while mostly the same, does have some differences due to the differ needs our Enterprise customers face. For codecov.io, https://docs.codecov.io/docs/codecovyml-reference is the best reference.

Let’s look at 4c8e16e91c405091efe0b306f8c752387175657d

There’s a couple issues here. First, you appear to still have after_n_builds in the behavion section, which i making the YAML fail the parser. Second, the fallback is , as you say, 3 out of 5 builds.

When I check the database for that commit I only see 3 uploads, which is why. Can you link me where you see that Codecov is saying we processed 5 so I can see what happened?

EDIT: I retract what I said about the after_n_builds. It’s not valid, but looks like it should be. Discussing this with engineering.

1 Like

Ok, thanks for the clarification!

Sorry about the confusion, I try not to but still mix the docs for the two of them from time to time. Thanks for commenting on the docs suggestion too.

I’ll try to explain myself better. codecov website does not always receive the 5 builds, and I have no idea why. In the same PR, with exactly the same tests run on both commits:

Any idea as to why it works only sometimes?

Note: base tests and external tests upload to codecov, benchmarks does not, and the other two are not run for PRs.

Can you try the -n flag to give the uploads names and see if that helps us determine which ones are not completing the upload step? codecov-bash/codecov at master · codecov/codecov-bash · GitHub

1 Like

I have opened a PR only for troubleshooting this so it won’t get merged because of the other changes.

I found an error message in one of the builds where not all reports were uploaded! It actually is in the base commit of the PR linked above, which is why we get a nearly 2% coverage increase by adding a name to the uploaded codecov builds. Would using --required flag solve the issue by failing CI build when codecov upload fails? Or is there a better way to tackle this problem?

I am copying the error message, for the full traceback and context see link:

Error: HTTPSConnectionPool(host='codecov.io', port=443): Max retries exceeded with url: /codecov/v4/raw/2020-03-06/2C016F20EC330FC563151DA316E11499/00c6d5c057944966e765399d1508181b89c1ce3d/8abc87f8-533f-42ca-b73d-fa182d6ef3f0.txt?AWSAccessKeyId=AKIAIHLZSCQCS4WIHD4A&Expires=1583500724&Signature=%2FGe0uHJMOuRL06J0nbZCI9uPdLs%3D (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fea47cab6d0>: Failed to establish a new connection: [Errno 110] Connection timed out'))

That looks like the python uploader, which I was not aware you were using. I’m not sure offhand what the --required flag does, I’m more versed with the bash uploader.

But yes, if you need the job to fail if the Codecov upload fails you will need some sort of flag as it normally does not stop the build on failure.

1 Like

I there any way to lower the number of times this error happens? I am not sure if this is an issue with azure sending the reports or with codecov receiving them. And if possible we’d like to avoid rerunning the whole job to reupload coverage to codecov.

We are using python uploader, would using the bash make any difference?

Note: Just to be extra clear with what I said above, we don’t really know at all the reason why the uploads are failing nor have any idea about how to investigate the reason, so any help on this direction is greatly appreciated too.

If you are using the bash uploader, the -v flag will output a lot of information, including the message of any network upload errors. That would probably help.

1 Like

We set -v flag, and we can’t see anything to help us tackle the upload error :confused: , we tried 5 uploads waiting 90 seconds between upload and all 5 failed:

  1. https://dev.azure.com/ArviZ/ArviZ/_build/results?buildId=2109&view=logs&j=47ad93d4-36d1-5c3a-cec5-e10661d968e4&t=40196f2b-fcc3-5917-95bb-a63fc32c7019&l=13589
  2. https://dev.azure.com/ArviZ/ArviZ/_build/results?buildId=2109&view=logs&j=47ad93d4-36d1-5c3a-cec5-e10661d968e4&t=40196f2b-fcc3-5917-95bb-a63fc32c7019&l=27176
  3. https://dev.azure.com/ArviZ/ArviZ/_build/results?buildId=2109&view=logs&j=47ad93d4-36d1-5c3a-cec5-e10661d968e4&t=40196f2b-fcc3-5917-95bb-a63fc32c7019&l=40763
  4. https://dev.azure.com/ArviZ/ArviZ/_build/results?buildId=2109&view=logs&j=47ad93d4-36d1-5c3a-cec5-e10661d968e4&t=40196f2b-fcc3-5917-95bb-a63fc32c7019&l=54350
  5. https://dev.azure.com/ArviZ/ArviZ/_build/results?buildId=2109&view=logs&j=47ad93d4-36d1-5c3a-cec5-e10661d968e4&t=40196f2b-fcc3-5917-95bb-a63fc32c7019&l=67937

comit sha: 89d56251eb3cf360433ca4e19405c469b88bf365, you can check that 4 out of 5 builds uploaded correctly.

It looks like you are still using the python uploader, can you you try the bash uploader with -v, please?

1 Like

Changed to using the bash uploader, there is no comparison between them! The bash uploader retries automatically to reupload results when there is a connexion error, the verbose flag does print useful info, thanks for the pointer!

Is there somewhere in the docs discouraging the use of the Python uploader? Maybe there should?

Here is the commit using the bash uploader: 419e9242fecbd94b8a8d644238f6eda5f18523cf, and here is one build that could not upload on the first try but eventually succeeded: https://dev.azure.com/ArviZ/ArviZ/_build/results?buildId=2135&view=logs&j=e6a7683b-6131-58a8-ef68-5f3a9120796c&t=0a472ee5-4a3b-5581-ec9e-6294371ddc1c&l=14

So it looks like using the bash uploader should solve the upload failure issue in most cases :tada:



I have also tried to upload the coverage from a fork PR (it has no access to secrets and therefore no token) because it looks like public Azure Pipelines projects should not need a token:

If you have a public project on TravisCI, CircleCI, AppVeyor, Azure Pipelines, or GitHub Actions an upload token is not required.

Our project is https://dev.azure.com/ArviZ/ArviZ which is public (I double checked in project settings that visibility is public), however, we got the following message:

Commit sha does not match Azure build. Please upload with the Codecov repository upload token to resolve issue.

Here is the commit 5ec1c2c3802382d4be2847053ebb1e57ef0c76cb (note that nothing was uploaded to codecov for this commit, I am not sure it will be of any help) and the link to the build (where the output of codecov bash uploader can be read): https://dev.azure.com/ArviZ/ArviZ/_build/results?buildId=2137&view=logs&j=e4994452-efbb-51bd-c9fc-f1d030f5bbfb&t=b874d77c-8497-5843-d5d5-e253609482a4&l=14

1 Like

I’m fairly sure you are hitting CodeCov Uploads from Azure Pipelines are failing with 'Build numbers do not match' - #10 by X-Guardian (where the Azure API returns an different SHA on merge commits.

I would see if you can confirm, then watch that thread.

Not yet. We are trying to reduce the usage of all the uploaders to a single one, but it’s a slow process and we are trying to make it as least disruptive as possible.

1 Like

It looks like it is the same issue, thanks, I tried searching for this error message but somehow missed the thread, thanks again.

I’ll update the steps to troubleshoot codecov issue to take this into account:

  1. Use cat codecov.yml | curl --data-binary @- https://codecov.io/validate to validate my yaml file. It says it is valid.
  2. Make sure to use bash uploader, set -v flag to debug. Check the parsed yaml (if using v flag) tracks.
  3. Check the Settings section of the repository, to check that parsed version is equal to yaml file.

Thank you so much for your help!

1 Like

Hey @OriolAbril, following up on this. Both exist now, and here is how they work:

notify.after_n_builds stops the whole notification flow, which includes comments. 
comment.after_n_builds stops only comments.

Thank you for your patience and persistence in pushing us to locate and resolve this bug. I’ll close out your docs thread as well, but if you want to make an edit to make it clearer, please do so.

Great! Thanks for all the help!

1 Like