Migrating a Hibernate Bag to a Hibernate List

October 27, 2010

We ran into an issue recently where we needed to move from a Hibernate bag to an explicitly ordered list. Our root entity is a TaskSchedule, which has many Task(s). We needed to add a list_index column to the Task table, populate the new column and then update our code to use it. Because Hibernate didn’t know about the new column, our existing code could continue to function as if it didn’t exist and when we deployed the new service, it would begin using list_index. If we were adding new Tasks between the time we populated the column and the new code was deployed, we’d have to write some transitional code to update, but not use the new field. Luckily we were able to avoid that.

The xml configuration change to do this is relatively simple. We moved from:

<class name="TaskSchedule" table="task_schedules">
 <!-- ... -->
 <bag name="tasks" cascade="all-delete-orphan" lazy="true">
   <key column="task_schedule_id" />
   <one-to-many class="Task" />
 </bag>
</class>

to:

<class name="TaskSchedule" table="task_schedules">
 <!-- ... -->
 <list name="tasks" cascade="all-delete-orphan" lazy="true">
   <key column="task_schedule_id" />
   <list-index column="list_index" />
   <one-to-many class="Task" />
 </list>
</class>

However we had a few thousand rows that needed their list_index populated. Two of us went off with a backup copy of the database and gave it a shot. Not surprisingly, we came up with two different solutions. Speed was of the essence, so we weren’t focusing on producing beautiful code (as you will soon see).

The first solution uses a temporary column for side effects:

begin;

set @last_schedule_id := 0;

alter table tasks add column temporary_column bigint;

update tasks set 
  list_index = if(
      concat(task_schedule_id) != concat(@last_schedule_id),
      -- some weird typing required coercion to string for the comparison
      @i := 0, @i := @i + 1), 
  temporary_column = if(
      concat(task_schedule_id) != concat(@last_schedule_id), 
      @last_schedule_id := task_schedule_id, 0) 
  order by task_schedule_id, creation_time, id;
  
alter table tasks drop column temporary_column;

commit;

The second defines a procedure (make sure to run the mysql client with -A or it will try to do tab completion):

DELIMITER //
create procedure foo()
begin
 declare cnt bigint default 0;
 declare cur int default 0;
 declare t_id, ts_id bigint;
 declare done int default 0;
 declare s cursor for select id, task_schedule_id from tasks order by task_schedule_id, creation_time, id;
 declare continue handler for not found set done = 1;

 open s;

 read_loop: LOOP
  fetch s into t_id, ts_id;
  if done then
   leave read_loop;
  end if;
  if cur = ts_id then
   set cnt:=(cnt + 1);
  else
   set cur := ts_id;
   set cnt := 0;
  end if;
  update tasks set list_index = cnt where id = t_id;
 end loop;

 close s;
end;
//
DELIMITER ;
call foo;
drop procedure foo;

Both of these work and took about the same amount of time to write. As the author of the second bit of code, I’m partial to that method, but I don’t doubt that there are better ways to get this done. Anyone have a better solution?