Kaizen at Arts ISIT, UBC

I recently got an email from the university soliciting ideas for efficiency improvements, so it is a good time to document theĀ “kaizen” (continuous improvement) culture in my unit. For context, Arts ISIT has a small software development team, which I belong to. Our mandate is to support the T&L, research, and administration needs of our faculty, by adapting or developing technologies. The requests are diverse, yet ISIT is just a unit within the Faculty of Arts, so our funding is quite limited. That means we need to be efficient. I have never used the term kaizen at work, as we are a long way from being a Toyota story, but I have a belief that if a small team continuously self-reflects on process efficiency and leverages the vast intellectual capital in a university, it can provide top-notch service and create world-class innovations.

A successful case study at Arts ISIT is the CLAS project. CLAS is a video management and interaction system that arose out of the need to apply videos in education in a pedagogy-rooted manner, simplify user experience so that using videos in a course is not a burden, and give students deeper and more varied ways of interacting with videos. Spreading mainly by words of mouth, CLAS now serves 2500 students each year in 3 faculties and satisfies many complex requirements that cannot be met by Youtube or off-the-shelf video systems. CLAS also supports research from Psychology, Education, and Applied Science, and incorporates some of the findings back. The kaizen spirit that is now spreading in my unit can be said to have begun with this project.

The service model of CLAS represents a win-win-win collaboration between researchers, educators, and staff. Researchers see their latest work implemented professionally within a year or two from an initial idea, and limited-scope results from singular studies are aggregated into a complete product suite that may gain critical mass as a whole. Educators benefit from a video system informed by up-to-date research. As staff, I have the opportunity to apply my M.Sc. degree in HCI, received at UBC, to the fullest. Our support staffs also enjoy the boons of in-house product control. Because the development is very close to users, when an instructor has a novel use-case related to videos, our support team can confidently say “Yes, or if not yet, we can discuss how to make it happen.” Thus, the support work for CLAS becomes more intellectually rewarding, with activities like consulting educators on the technology and learning about their pedagogical needs, as opposed to firefighting problems that cannot be resolved internally. Finally, this project and others at ISIT instill in our staff and our users a sense of pride, as we are all contributing to UBC products that could be called “world-class.” After all, various external parties have inquired us if CLAS is commercially available yet (including universities in the Canada, the US, Japan, the UK, Australia, New Zealand, and most notably, the Ministry of Education of Singapore).

In other words, the dev team at Arts ISIT could be seen as the start of something similar to MIT Hyper Studio, Media Lab, and similar research & development groups in other top universities.

A good model still needs good execution, so what are the day-to-day secret ingredients of CLAS?

  1. Keep team structure flat and task division fluid: In the early stages of CLAS as a production service, I handled nearly all support tasks and communication, despite my official role of programmer analyst. This was not out of an obsession over a technology or a lack of organizational discipline, but completely out of needs. Like in a startup, the person most knowledgeable about a topic at each stage of an endeavour will handle whatever tasks necessary to move the endeavour forward to the next stage until others can take over. As the project matures, division of labour and communication happened fluidly. Our support team began to take over, and now completely own, the support website and user documentation. My manager took over all the project management aspects the moment the abstract visions became clear enough that a scope could be defined and split to milestones and tasks. Our unit has 25 people total, but most importantly, 19 of them are technical support, instructional support, and engineering staff. This makes for a rather “flat” management structure, with the most resources being spent on the productivity drivers. So how do we accomplish the requirement analysis and evaluations of alternative solutions? We leverage the intellectual capital of the university, which is our next best practice…
  2. Collaborate! Communicate! My academic chair, my manager, myself, and our learning support team routinely reach out to our faculty members, using any channel most convenient for the faculty members to share their thoughts: walk-in consultation, focus group, contact forms, and support email address published in multiple locations. We are flexible and eager to listen, because providing service to faculty members and students is our first priority, not procuring technology for technology’s sake. Our analysis and evaluation function is thus distributed among my manager (finance & resource planning), myself (matching technical constraints and alternatives with functional requirements and usability), and the faculty members themselves (contributing their observations from the trenches, research results, and a first hand account of their needs). The direct communication and flat team structure make this process remarkably fast and effective compared to the big corporate cultures that I used to work in. The startup attitude also inspires a sense of ownership in faculty members and partners, even those outside of Arts. Three other Faculties have independently solicited grants to add functionalities to CLAS, and Faculty of Education in particular even volunteered to create professional orientation videos for the system.
  3. Intellectuals lead and management manage: As the technical project lead, I have the clearest sense about what is possible now, what is not yet possible, and what can become possible given each architectural choice. I discuss these options openly with my manager. My manager creates the project management scaffolding around my work, but keeping to the abstract requirements instead of dictating the “how”. My manager also lends a set of strategic eyes and ears for risks and funding opportunities, and discusses them openly with me. Management gather the fuel to keep the ship going and look out for rocks and shallows while intellectuals try to navigate the map and steer. We developed trust and respect from this healthy partnership.
  4. A technology project needs a lead with deep working ability in technology and also in user experience, since this role requires bridging the engineering gap between stakeholders’ needs, expressed in human language, and all the abstraction levels of technology. More generally, you could say that a project about a domain X needs a lead who is both an expert in domain X and in user experience.
  5. Our support team is also our Quality Assurance team, allowing support staff to become very familiar with each new set of capabilities before those capabilities are released into the production service. In addition, front-line support staffs have an intimate knowledge of how instructors and students think, which inform their testing. Finally, work study students who support our technology learn a variety of useful skills: QA testing process, drafting test cases, user consultation, and articulating about technology.
  6. Test! Verify! Assume nothing! I test my projects ceaselessly with each incremental change, so that by the time a release reaches QA, it is already virtually problem free. When integrating an external technology, I verify all advertised features, keeping in dialog with our instructional design and support staff or directly with our users, to make sure that use-cases are actually being met *at an adequate quality of user experience*. I insist on a very generous allotment for testing in project time estimates, because an exhaustive verification regiment is crucial to high service quality and also results in a net cost saving. It is much easier to fix a bug in code you just wrote, or an external product you just integrated, than to deal with a trouble ticket about a hasty decision 6 months before. More importantly, a single IT problem or subpar user experience can waste hours of 20, 50, 100 faculty members at a time, negating any hypothetical benefits of short term cost cutting methods.
  7. Give time for engineering: The CLAS project is characterized by a paradoxical combination of blistering productivity and high quality. A huge number of enhancements are released every term (this is a typical 3-month update), while having a yearly trouble ticket count of less than 5. This is due to my insistence, and my manager’s support, on spending time creating and improving our engineering framework. A few of our pearls:
    1. There should be a multi-tenancy architecture for every project, big or small: each tenant (virtual instance) should be able to stand on its own physical server or a shared server, yet is completely separate from each other in terms of database, code base, configuration files, database users and passwords, and encryption key. These instances should be so isolated from each other that you can compromise and destroy one instance without any harm to all the others.
    2. Every project should be version controlled: The repository structure for each project should contain separate areas for core deliverables (code, configuration files, database schema, etc.), implementer’s documentation (technical notes for me and others in the dev team), stakeholders’ reports (drafted by myself after each milestone, improved upon by support team), and user documentation (by the support team).
    3. Every project should have an automatic, transparent upgrade system in the back-end: this kind of systems pays for itself many times over in productivity increase and reduction of human error. So how transparent are we talking about here? Watch the demo!

“Politeness” and “kindness” – considering usability in an automated system

Last week I integrated the “Interactive Media for Education” (IME, aka. CLAS) video app with our school’s student information system (SIS) so that the app can update viewer lists daily for media collections that are linked to courses. The API is a simple REST service, so at first glance I thought all I needed to do was to send a request, get the data, and follow the specs correctly to parse the data and update my database, with a healthy dose of unit testing to smooth out the edge cases. Even so, I just couldn’t start coding right away even if this project is still a one-person show and I’m strapped for time. That habitual need to take long walks and imagine all the usage situations that a new feature may go through just refused to let me go, and so I walked, and I prototyped. After half a week of tinkering and thinking about it, I realize that even automated systems have a usability angle that one must consider.

Who are the “actors” in an automatic enrollment list update system? The data source I talk to is one, since it represents the designers and programmers creating that system and also the organizational culture and policy that they work in. Understanding those involved with my data source allow me to consider edge cases unwritten in the documentation. For example, I remember hearing in passing from support staff that “courses number with a letter at the end sometimes have strange enrollment lists”. Sure enough, some, not all, test courses yielded zero students. I contacted the SIS team to hash this bug out, and realized that this was not a programming bug but an organizational issue, a naming rule inconsistency between the departments and the centralized body. Not a problem I can solve now, but I got the information needed to create a work around before the service went live. Continuing this line of thoughts, I imagined how the data source would behave over time, and what it “needs”. During the course registration period before each term, the back end database of my source would be hammered, so I should be polite and stop asking it for data nightly.

Another actor is the student using the IME app. The students never see the auto provisioning service, but they are affected by it. In the first month of term when the enrollment list is in flux, tension is high, and a student who just registered late for a course from a waitlist will feel that a one day wait for a video collection to open to them is as long as two, so I force the student list to update twice a day, slowdown or not. Also, I realize that I cannot anticipate all bugs, and all human errors that either I or many other staff involved with enrollment can make, so during this period the service runs on a “kindness” mode, where new students added to courses are updated, but students dropped from courses are not applied immediately, in case that student actually lost access because of a bug or human error. Would this then jeopardize academic integrity? To address this, I make sure that the “kindness” mode is only applied to courses where videos are only shared to everyone in the course. A “one to many” content distribution relationship implies that this is just a lecture video, while “one to one” or “many to many” or “many to one” would imply coaching, group work, or assignment submissions, situations where integrity is required and “kindness” must not be misplaced.

Naturally, at this point I wish for a service where updates are PUSHED to my video app, instead of REQUESTED by my video app. My university actually do have such a service. However, integrating with this push service will take more time, because it was purpose built for the centrally-managed Learning Management System (LMS) and was not meant to be simply hooked into another app. Getting the permission and arranging the details about testing environment and move-to-production process also takes more time. With this delay, the feature might not become useful to students for at least one term. More importantly, as the number of users increase and we are developing a more enterprise-scale support model, having scalable support for user provisioning is also a crucial feature to the service managers who are evaluating the app from an operation view point right now. I decided to go with the request method to create an immediate impact to users and stakeholders, while having a conservative development schedule to ensure production quality. Meanwhile, I communicated with the LMS team to express my interest in the next iteration of their enrollment list push system, and they have committed verbally to provide an external facing API in the future.

Finally, the needs of the the support team must also be considered. An automatic enrollment service needs a mechanism to register and deregister courses from automatic user provisioning, and the support team are the ones who will do this. This process should be seamless instead of adding to their already busy workflow when first setting up a course and then supporting it on requests from instructors, otherwise it will create another source for human errors, forgetfulness under stress, and delay, especially during that hectic first month of a term when students need robustness the most. Thus, I decided that there would be no extra UI for registering and deregistering a course for auto-provisioning. These actions will be integrated into the normal support workflow that the team had always been used to. When a new course is created by an instructor, if that instructor specifies that the course is linked to SIS, it is immediately registered for auto-provisioning. If the support team receive a request to bulk import enrollment from SIS, that action will register that course for auto-provisioning. If the enrollment list is then changed manually for any reason, using any methods in the admin interface like the Add / Drop tool or the CSV import tool, the enrollment list will be deregistered, so that the automatic source will not overwrite the manual addition. Support staff also often need to use test acccounts, so I included a reserved account list in the design and exclude these accounts from being changed by the automatic updates, yet adding and dropping these accounts from courses manually also do not remove the auto-provisioning status.

Last but not least, to minimize the risk caused by having such a short development time frame, every automatic back-end features of the IME, this auto user provisioning system included, have extensive logic to check for corrupted data, data that is technically specs-valid but seems unusual (too many students to be dropped, too many blank entries, etc.). The cumulative time to update all courses, the time for each course, and the time and error detection status for each part in the process for each course, are collected, evaluated for unusual patterns, and dumped into an email report that the developer and support team can monitor for the first few months of the feature. All this monitoring and error detection may seem overbuilt for something so simple, but this extra development time is to trade for a much bigger time saving. This method allows the team to treat the first few months of live service as an extended QA testing phase, by being able to notice when things fail and exactly what course, what step it fails on, even which users whose enrollment status are being modified at the time, so that the team can manually recover from failures quickly and at the precise point before users feel any pain.