Uploaded image for project: 'GPII - Global Public Inclusive Infrastructure'
  1. GPII - Global Public Inclusive Infrastructure
  2. GPII-4014

The deployment of the GPII-3717 work to enhance logon procedure from NOVA

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Before the deployment

      Javi: send these info to Ops:

      1. the client credential IDs (_id values for these CouchDB documents) created for NOVA and Chickasaw;
      2. the json file that contains new NOVA privileged client credential created for testers.

      Joseph:

      1. update gpii-infra pull request to point to the new universal image: https://github.com/gpii-ops/gpii-infra/pull/383

      Ops: keep track of these backup information that will be used for the restore in the rollback plan

      1. the current version of the docker image running in the production: 20190522142238-4a52f56
      2. the last database backup before running the deployment. 
      3. Stop k8s-snapshots just before the persistance is upgraded in each environment. Start right after the automatic migration is successful.
      4. Note that since the database is backed up every 5 mins, no special database backup is required
        1. Latest staging DB snapshots:
          pv-database-storage-couchdb-couchdb-0-060819-090056 	us 	2.2 MB 	Aug 6, 2019, 11:00:57 AM 	Manual 	gke-k8s-cluster-3ca769-pvc-9dcefce8-3f70-11e9-86f3-42010a8001e3 	10 GB
          pv-database-storage-couchdb-couchdb-2-060819-090029 	us 	4.89 MB 	Aug 6, 2019, 11:00:30 AM 	Manual 	gke-k8s-cluster-3ca769-pvc-6998790a-3f71-11e9-86f3-42010a8001e3 	10 GB
          pv-database-storage-couchdb-couchdb-1-060819-085949 	us 	5.79 MB 	Aug 6, 2019, 10:59:50 AM 	Manual 	gke-k8s-cluster-3ca769-pvc-0141f4a8-3f71-11e9-86f3-42010a8001e3 	10 GB 
          

      Deploy plan

      Step 1: Ops:
      1. Merge and start deploying Joseph's pull requests

      2. Make sure version-updater syncs the new universal to GCR, and that dev-doe succeeds

      3. Start a docker container of the new universal image. In gcp/live/dev directory, run:
      Note for OPS:
      i.) "rake grant_project_admin" will have to be run before one is able to execute these commands in stg/prd
      ii.) run `rake couchdb_ui` in a separate window to obtain set of temporary CouchDB credentials that can be used in step 10 & 12. (credentials will be valid till the terminal is kept open, the forwarded port 35984 will also used in step 7.)
      iii.) The image used should match the version deployed by PR. The command below represents the latest change I could find: https://github.com/gpii-ops/gpii-infra/pull/383/files#diff-f9c7d077087bf22b3cf15f2115e30de5. Note that version-updater will need to sync this image to GCR before you can use it the command below.

      rake sh
      kubectl run -n gpii --rm -it --image gcr.io/gpii-common-prd/gpii__universal:20190809170605-f8485e9 universaltmp sh
      

      4. Open Fauxton UI at http://localhost:35984/_utils/#/database/gpii/_all_docs, click "design documents" on the menu at the left -> select the only document "_design/views" -> add the view function below to the "views" object:

          "findAccessTokenByExpires": {
            "map": "function(doc) {if (doc.type === 'gpiiAppInstallationAuthorization') emit(Date.parse(doc.timestampExpires), doc); }"
          },
      

      5. Run the one time access token cleanup script to remove expired access tokens from the database:

      node scripts/deleteExpiredAccessTokens.js http://[USERNAME]:[PWD]@couchdb-svc-couchdb.gpii.svc.cluster.local:5984/gpii 5000
      

      With the staging size of 320K documents, the first run will hit this error:

      Error retrieving documents from database: 500 - Internal Server Error
      

      Because it triggers the building of the new view index for "findAccessTokenByExpires". Go to Google cloud console and watch CouchDB CPU and memory usages soaring up for around 25 minutes. During this 25 minutes, flow manager requests will receive "server-error" response. When usages start to go down, re-issue the same command, which should start to clean up the database.

      With the production size of 60K documents, this error didn't show up and the first run cleans up the database.

      6. Run the migration script 1 to update NOVA client credential documents to 0.2 data structure. Assuming client credential ids for NOVA and Chickasaw are "clientCredential-nova1" and "clientCredential-nova2", replace [USERNAME] and [PWD] in the command below with real couchdb credential, in the bash prompted by the command above, run:

      node scripts/migration/schema-0.2-GPII-4014/migration-step1.js http://[USERNAME]:[PWD]@couchdb-svc-couchdb.gpii.svc.cluster.local:5984 "clientCredential-nova1" "clientCredential-nova2"
      

      Alfredo: The credentials provided by couchdb_ui didn't work. So I passed the admin credentials using env variables:

      kubectl run --env="u=${TF_VAR_secret_couchdb_admin_username}" --env="p=${TF_VAR_secret_couchdb_admin_password}" -n gpii --rm -it --image gcr.io/gpii-common-prd/gpii__universal:20190801163411-26be63f universaltmp sh
      node scripts/migration/schema-0.2-GPII-4014/migration-step1.js http://${u}:${p}@couchdb-svc-couchdb.gpii.svc.cluster.local:5984 "uuid1" "uuid2"
      
      Infusion at path /app/node_modules/infusion is at top level 
      11:01:14.528:  Registering module gpii-universal from path /app/
      11:01:14.529:  Registering module infusion from path /app/node_modules/infusion/
      11:01:14.529:  Registering module infusion from path /app/node_modules/infusion
      COUCHDB_URL: 'http://couchdb-svc-couchdb.gpii.svc.cluster.local:5984/gpii'
      Updating the client credential ID:  uuid1
      Updating the client credential ID:  clientCredential-1
      Updating the client credential ID:  uuid2
      Updated  3  of  103663  GPII documents.
      Done.
      

      7. Promote common-prd. Monitor dev and stg deploy.

      8. Once stg deploy has completed, make sure gpii-dataloader has run
      Open Google Cloud Platform console, follow Kubernetes Engine => workloads => dataloader => In the "Managed pods" section, select "dataloader-..." pod => In "containers" section, select "view logs" of a container named "gpii-dataloader". In the log, there should be messages such as "Updated views: ..." and "Deleted 120 Prefs Safes and 120 associated GPII Keys" and "Bulk loading of build data from '/app/build/dbData/snapset'" and "Done."

      9. Make sure Joseph's "flushtokens" cronjob looks good and runs fine. The amount of access tokens deleted here is no longer matters. The one time access token cleanup at the start should have taken care of the major cleanup.
      Follow the instructions above but to view the log of "gpii-flushtokens" container, there should be a message "Done: No more expired access tokens to delete. Deleted {numberX} expired access tokens in total."
      Here's a filter for the GCP logs that finds all and only "Deleted X expired access tokens in total". If you started the cronjob at 3:37pm, set the filter start time to that:

      "Error retrieving access tokens from the database: connect ECONNREFUSED" OR
      "Deleted " AND "expired access tokens in total."
      severity>=INFO
      

      10. Repeat step 3 to start the docker container of the new universal image, run the migration script 2 to update "schemaVersion" and "timestampUpdate". Please replace [USERNAME] and [PWD] in the command below with real couchdb credential.

      node scripts/migration/schema-0.2-GPII-4014/migration-step2.js http://[USERNAME]:[PWD]@couchdb-svc-couchdb.gpii.svc.cluster.local:5984 5000
      

      Depending on the data volume, this error might occur at the first a couple of time when this command is issued:

      Infusion at path /app/node_modules/infusion is at top level 
      14:38:28.243:  Registering module gpii-universal from path /app/
      14:38:28.244:  Registering module infusion from path /app/node_modules/infusion/
      14:38:28.244:  Registering module infusion from path /app/node_modules/infusion
      COUCHDB_URL: 'http://localhost:35984/gpii'
      Error retrieving documents from database: 500 - Internal Server Error
      

      This is OK as it's because the view index is still in process of building and updates haven't started at this point. Wait a few minutes and re-issue the command until seeing:

      Updated  5000  of  418002  GPII documents.
      Updated  10000  of  418002  GPII documents.
      ...
      

      11. Follow these instructions to load Javi's json file that has new NOVA privileged client credentials for testers

      12. Repeat step 3 to start the docker container of the new universal image, run the verification script and check the output. Please replace [USERNAME] and [PWD] in the command below with real couchdb credential.

      node scripts/migration/schema-0.2-GPII-4014/verify.js http://[USERNAME]:[PWD]@couchdb-svc-couchdb.gpii.svc.cluster.local:5984 10000 "clientCredential-nova1" "clientCredential-nova2"
      

      This script will output a verification report.
      These reports are fine:

      • Report "All passed." and "Done: ...";
      • Report on a number of documents having "the value of timestampUpdated is empty" is OK. These documents are either snapsets or the new user data created after the new universal image has been deployed but before running the verification script.
        Any reports about wrong client credentials or schema versions are not ok.

      Step 2. Kavya and Javi: Test Morphic 1.1 and 1.2 on these scenarios:

      1. Morphic that uses the NOVA client credential are able to
        1. create new GPII keys and prefs safes;
        2. save preferences that contain one or more pref keys that are in this list - https://github.com/cindyli/universal/blob/GPII-4017/scripts/migration/migration-GPII-3711.js#L41-L48
        3. saving preferences that are not on the list at the line above receives an "unauthorized" message
        4. Able to perform all operations with existing GPII keys, such as keyin or update preferences
      2. Morphic that use non-NOVA client credentials
        1. Not able to create new GPII keys and prefs safes. The request will be rejected with an "unauthorized" message
        2. Able to perform all operations with existing GPII keys, such as keyin or update preferences

      Deployment completes successfully if all tests pass.

      Rollback plan

      Ops: If errors reported during the migration or test failures from Kavya and Javi's test are serious enough after investigations, restore the database and the universal docker image to their backups that are tracked before the deployment. This rollback will result in about 1 hour of cloud down time.
       
       
       
       

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              cli@ocad.ca Cindy Qi Li
              Reporter:
              cli@ocad.ca Cindy Qi Li
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: