Details

    • Sprint:
      Ops Pilot CC 1 - Sprint 20
    • Story Points:
      3

      Description

      On Saturday 2019-02-09, we started receiving alerts about memory consumption (first incident).

      After some investigation, Cindy Qi Li and I found:

      Cindy proposed a couple of experiments, which I will run (since they require some tweaks to gpii-infra code):

      • Spin up a dev cluster without locust tests, uptime checks, or internal health checks – no meaningful traffic to the Pods. Observe memory usage for a day or two.
      • Slowly add back sources of traffic. Observe memory usage.
      • If that doesn't provide enough data, compare behavior between different versions of universal from before and after the critical Jan 29 date

      (This situation also revealed that our resource utilization alert wants some re-thinking and/or tuning: https://issues.gpii.net/browse/GPII-3718.)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tyler Tyler Roscoe
                Reporter:
                tyler Tyler Roscoe
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: