At Fri, 29 Nov 2019 22:52:58 +0100, Steffen Nurpmeso <steffen(a)sdaoden.eu> wrote:
Subject: Re: [TUHS] another conversion of the CSRG BSD SCCS archives to Git
Greg A. Woods wrote in <m1iVoBV-0036tPC(a)more.local>:
|I've been fixing and enhancing James Youngman's git-sccsimport to use
|with some of my SCCS archives, and I thought it might be the ultimate
|stress test of it to convert the CSRG BSD SCCS archives.
|
|The conversion takes about an hour to run on my old-ish Dell server.
|
|This conversion is unlike others -- there is some mechanical compression
|of related deltas into a single Git commit.
|
|https://github.com/robohack/ucb-csrg-bsd
Thanks for taking the time to produce a CSRG repo that seems to
mimic changesets as they really happened. As i never made it
there on my own, i have switched to yours some weeks ago. (Mind
you, after doing "gc --aggressive --prune=all" the repository size
has more than halved, it was the final reason to prepare new
repositories on a vhost with good internet connection before
getting this through my flaky wifi here. Storage and internet
bandwidth and their cost really do not seem to bother anyone
anymore. I have no offense in mind, i only recognized it (the
hard way).)
Ah! I did indeed forget the "git gc" step that many conversion guides
recommend. I might change the import script to do that automatically,
particularly if it has also initialised the repository in the same run.
Apparently github themselves run it regularly:
https://stackoverflow.com/a/56020315/816536
Probably they do this by configuring "gc.auto" in each repository,
though I've not found any reference to what they might configure it to.
However it seems that without the "--aggressive" option, nothing will be
done in this repository. With it though I go from 316M down to just 71M.
I don't see any way to force/tell/ask github to run "git gc --aggressive".
Perhaps I can just delete it from github and immediately re-create it
with the re-packed repository, and in theory all the hashes should stay
the same and any existing clones should be unaffected. What do you think?
Note I have some thoughts of re-doing the whole conversion anyway, with
with more ideas on to dealing with "removed" files (SCCS files renamed
to the likes of "S.foo") and also including the many files that were
never checked into SCCS, perhaps even on a per-release basis, thus being
able to create release tags that can be checked out to match the actual
releases on the CDs. But this will not happen quite so soon.
--
Greg A. Woods <gwoods(a)acm.org>
Kelowna, BC +1 250 762-7675 RoboHack <woods(a)robohack.ca>
Planix, Inc. <woods(a)planix.com> Avoncote Farms <woods(a)avoncote.ca>