Go forward in time to November 2004.
Notzed writes about GtkTreeView being slow when getting the selected rows. But what does this mean? First we need to know about an implementation detail, which is that internally GtkTreeView uses a red-black tree for the view's representation. The nodes in that tree keep some flags, including a flag for whether the node is selected. Now, what does it mean to get the selected rows?
gtk_tree_selection_get_selected() — This only works for non-multiple selection mode. GtkTreeView keeps a row reference for the anchor node; when this function gets called, it tries to find the anchor node in the RBTree, which is an O(log num_nodes) operation.
gtk_tree_selection_get_selected_rows() — This gives you a GList of GtkTreePath. For non-multiple selection, it short-circuits the operation by calling gtk_tree_selection_get_selected(). For multiple selection, it walks the tree, looking for nodes with the selected flag set. As it walks the tree, it modifies a running GtkTreePath, which is an array of indices:
path = gtk_tree_path_new (); /* Path is "0" */ list_of_selected_paths = list_new (); walk_tree (t) { if (t == null) return; if (t.children) { path.down (); /* adds a level to the path, e.g. from "0" to "0:0" */ foreach (c in t.children) walk_tree (c); path.up (); /* removes a level from the path, e.g. from "0:5" to "0" */ } if (t.selected) list_of_selected_paths.add (path.copy ()); path.next (); /* This increments the last index in the path, e.g. from "0:0" to "0:1" */ } walk_tree (treeview.root);
This is O(number_of_nodes) plus the act of maintaining the running path. Is the latter a source of slowness? Who knows; profiling is needed.
gtk_tree_selection_selected_foreach() — For non-multiple selection, it short-circuits the operation. For multiple selection it also walks the tree, calling your callback and also maintaining a running GtkTreePath.
So, there's no reason for this to be slower than ETable, which uses a bit array to represent the selection. If it's slower, it should be so by a constant factor. If that factor is too large, there's some profiling and optimization left to do.
In the afternoon:
The New York City subway turns 100 years old, and there's a beautiful interview about it on Newsweek.
During the Gnome Summit, Miguel suggested making the file chooser try to load an folder within a specified time, say, 500 milliseconds, and then display it, to avoid a lot of visible re-sorting in the tree view. I found some interesting things.
I tested a folder with 10,000 files. The files were consecutively numbered and created in reverse order (9999, 9998, ..., 0000) to try to make GtkTreeView work more when sorting. Stopwatch times for loading this folder with a warm cache were as follows:
ls | Under 0.3 sec.; hard to time. |
mc | Under 0.5 sec.; hard to time. |
GtkFileChooser with the Unix backend | 2.7 sec. |
GtkFileChooser with the gnome-vfs backend | 11.3 sec. |
Nautilus | 9.2 sec. |
Some things to note here. 'ls' and the Midnight Commander are blindingly fast. They do little more than opendir(), readdir(), a bunch of stat()s, and qsort(). GtkFileChooser with the Unix backend does essentially the same, minus the qsort(), and adds GtkTreeView's own sorting on top. GtkFileChooser with the gnome-vfs backend and Nautilus are close to each other, and much slower than the previous cases. Why is this?
When you change folders in the file chooser, this happens:
change_folder (file_chooser, pathname) { set_hourglass_cursor (file_chooser); model = gtk_file_system_model_new (pathname); g_signal_connect (model, "finished-loading", unset_hourglass_cursor_callback); gtk_tree_view_set_model (tree_view, model); }
GtkFileSystemModel is an object that makes a GtkFileFolder look like a GtkTreeModel. Creating a GtkFileSystemModel looks like this:
gtk_file_system_model_new (pathname) { model = new_model (); model->folder = gtk_file_system_get_folder (pathname); g_signal_connect (model->folder, "files-added", files_added_cb); g_signal_connect (model->folder, "files-removed", ...); ... connect to notification signals ... g_signal_connect (model->folder, "finished-loading", finished_loading_cb); } files_added_cb (folder, list_of_paths, model) { foreach (p in list_of_paths) { iter = add_to_model (p); gtk_tree_model_row_inserted (model, iter); } } finished_loading_cb (folder, model) { g_signal_emit_by_name (model, "finished-loading"); }
That is, GtkFileFolder feeds GtkFileSystemModel with chunks of the list of files through the files-added signal. GtkFileSystemModel stores this list of files, and calls gtk_tree_model_row_inserted() for each of them. GtkTreeView catches this (with a sort model in the middle, most likely), and displays the result. However, this works with incremental additions and re-sorting is very slow.
GtkTreeModel emits notifications when things change inside it. These notifications are on a per-row basis: if you add 500 rows to a model, you must emit the row-inserted signal 500 times. In contrast, ETableModel and Java's Table Model have notifications that specify a range of rows. If you insert 500 consecutive rows, you emit a single notification. The view part of the model/view scheme can then do smarter things than re-sorting for each row individually.
The Unix backend for GtkFileChooser is fast because it never does any re-sorting. When the GtkFileSystemModel creates its GtkFileFolderUnix and requests its list of files, the folder reads in the entire directory with opendir()/readdir(). The model translates that into its own list of nodes. Since it has not been put inside a GtkTreeView yet, there is no re-sorting done: the above happens all within gtk_file_system_model_new(). When that function exits, the new model gets set on the file chooser's a tree view. From the tree view's viewpoint, it's as if the model came preloaded with data.
However, the gnome-vfs backend loads folders asynchronously, because a) it can; b) it's needed for remote folders, anyway. When gtk_file_system_model_new() exits, it is empty; it will get populated gradually as gnome-vfs emits its asynchronous directory-load notifications. By that time, the model will already have been inserted in the tree view, and the latter will have to do a lot of re-sorting when the files come in. I put in some timing code to measure the time that it takes to merge in new files into the model and notify GtkTreeView about it. This is the output for that folder with 10000 files:
1098841356.104476 commit start of 1298 paths 1098841356.334885 commit end 1098841356.841396 commit start of 300 paths 1098841356.910978 commit end 1098841357.422785 commit start of 700 paths 1098841357.588969 commit end 1098841358.099863 commit start of 600 paths 1098841358.262178 commit end 1098841358.793729 commit start of 700 paths 1098841359.093013 commit end 1098841360.628302 commit start of 3200 paths 1098841362.237341 commit end 1098841364.327650 commit start of 3202 paths 1098841366.436415 commit end
Times are seconds.microseconds. Notice that it starts at about 177 usec per path, but by the time it loads the last files, the time per path is at about 658 usec per path.
This is essentially this code:
files_added_cb (folder, list_of_paths, model) { timestamp_start(); foreach (p in list_of_paths) { iter = add_to_model (p); gtk_tree_model_row_inserted (model, iter); } timestamp_end (); }
Now, there could be four sources of slowness: gnome-vfs's async directory loading code, the implementation of GtkFileSystemGnomeVFS, the implementation of GtkFileSystemModel, or GtkTreeView and its sort model. I don't think the first two are the problem; the former is threads on top of opendir()/readdir(), and the second one can add files in constant time. Part of the problem lies in GtkFileSystemModel, as it merges its current, sorted list of files with the list of unsorted incoming files by sorting the latter, and doing a list merge. That's O(n log n). However, on top of that there is all the re-sorting that GtkTreeModelSort has to do when rows get added one by one.
Tomorrow I want to try reading as much of the folder as possible into the file system model before putting the model in the tree view. That's a cheap hack, for sure, but right now I don't have time to do some Real Profiling(tm).
During the marketing BOF at the Summit, Tim Ney and myself discussed some ideas about making GNOME easier to brand so that small distributions can ship it easily, and with a nicely customized appearance. I've put up some of these ideas here. These would be especially useful for local, government-sponsored distributions who normally don't have all the resources of a big distro vendor — think of all the work the Linex guys have done to customize their appearance, for example; small distros have to duplicate the ad-hoc work of finding what can be customized in GNOME in terms of branding. It could be well-documented and we could have some tools to make the task easier.
I've been looking for volunteers in Xalapa to do the work of finding out what distributions generally customize in GNOME so that we can factor that out, document it, and possibly write tools to make branding easier. Rodrigo and Germán are looking, too.
Today we went to watch Ararat. What a mish-mash of a movie. It's one of those films that try to mix several plot lines in a supposedly interesting way, but it doesn't manage to do it successfully. The sub-story about the stepsister/girlfriend is totally superfluous, as is the one about the gay son of the customs inspector and his actor friend. You end up learning a tiny bit about the Armenian massacre, but not caring about it.
Jamie has put up a nice article about why it's hard to make XScreenSaver use a toolkit rather than Xlib for its password dialog.
Tomorrow is the EPlugin hackfest!
El Santos, the animated series!
Miguel kindly gave me a copy of the Peter Sestoft book he mentions, and it is indeed very good! It's a reference of the language without the bullshit; it won't teach you how to program, but it will give you nice tables and diagrams of implicit versus explicit type conversions, and terse code examples. There's a nifty table of the growth functions for the different operations you can do on colletions: O(1) amortized for appending to a list, O(n) for List.IndexOf(), etc.
Thanks, Miguel :)
Trow pours from the UberTower, which holds three liters of beer. Attendants to the Summit will be able to enjoy it as well.
James: If you use gtk_icon_theme_get_default(), it will refer to the theme for the default screen. With a multiscreen setup, what you want is gtk_icon_theme_get_for_screen().
Today I flew to Boston for the summit. Tip for people flying in: bring your business cards to prove that you don't work in the U.S.A.
Apart from that annoyance, the flight was pretty good. The plane was mostly empty, so I could grab a whole row for myself.
Yesterday we watched Almodóvar's La Mala Educación. I really enjoyed it, especially since the whole topic about catholic schools fits in well with my current reading, Joyce's A Portrait of the Artist as a Young Man.
Oh, and on Friday we watched Fahrenheit 9/11, which finally got to Xalapa. I had been itching to watch this.
Today I gave a talk at the Facultad de Estadística e Informática of the Universidad Veracruzana. The idea was to motivate the Informatics students and tell them, basically, to learn how to be self-sufficient when it comes to learning computing. The talk went pretty well; I commented on their school's curriculum and gave a few volunteers a few of the little "write a function" tests from Joel's guide to interviewing. Not surprisingly, the answers were all wrong. Still, the students had a good laugh, and they came to realise some of the things that they were supposed to have learned already, but really didn't.
I ordered a Thinkpad T41, with the 14" high-resolution screen, and IBM/Mexico said it should arrive in about 4 weeks. Four weeks. I'm getting impatient already.
Also, I reinstalled my old/current laptop today. While rsyncing my home directory back to it, my main machine's network card started being funny. James and Davyd advised reducing the MTU size. This had never happened in the past; I wonder if the network card is about to die...
Go backward in time to September 2004.
Federico Mena-Quintero <federico@ximian.com> Fri 2004/Oct/01 22:24:45 CDT