On Fri, Feb 03, 2023 at 06:36:34AM +0000, Lars Brinkhoff wrote:
> Dan Cross wrote:
> > So, the question becomes: what _is_ that forum, if such a thing
> > exists at all?
>
> Some options:
>
> - Cctalk email list.
(cc-ed to coff, of coffse...)
I use to hang out on IBM-MAIN mailing list, too. While they are,
mostly, dealing with modern mainframes and current problems, they also
occasionally mention old story or two. Actually, since mainframe is
such a living fossil thing, the whole talk sometimes feels as if it
was about something upgraded continuously from the 1960-ties. Most of
it is uncomprehensible to me (never had proper mainframe training, or
unproper one, and they deal with stuff in unique way, have their own
acronyms for things, there are some intro books but there is not
enough time*energy), but also a bit educating - a bit today, a bit
next week etc.
> - ClassicCMP Discord.
> - Retrocomputingforum.com.
> - Various Facebook groups.
Web stuff, requiring Javascript to work, ugh, ugh-oh. Mostly, it boils
down to the fact that one cannot easily curl the text from those other
places (AFAICT). So it is hard to awk this text into mbox format and
read it comfortably.
--
Regards,
Tomasz Rola
--
** A C programmer asked whether computer had Buddha's nature. **
** As the answer, master did "rm -rif" on the programmer's home **
** directory. And then the C programmer became enlightened... **
** **
** Tomasz Rola mailto:tomasz_rola@bigfoot.com **
All,
I thought I would post something here that wasn't DOA over on tuhs and
see if it would fly here instead. I have been treating coff as the
destination for the place where off-topic tuhs posts go to die, but
after the latest thread bemoaning a place to go for topics tangential to
unix, I thought I'd actually start a coff thread! Here goes...
I read a tremendous number of documents from the web, or at least read
parts of them - to the tune of maybe 50 or so a week. It is appalling to
me in this era that we can't get better at scanning. Be that as it may,
the needle doesn't seem to have moved appreciably in the last decade or
so and it's a little sad. Sure, if folks print to pdf, it's great. But,
if they scan a doc, not so great, even today.
Rather than worry about the scanning aspects, I am more interested in
what to do with those scans. Can they be handled in such a way as to
give them new life? Unlike the scanning side of things, I have found
quite a bit of movement in the area of being able to work with the pdfs
and I'd really like to get way better at it. If I get a bad scanned pdf,
if I can make it legible on screen, legible on print, and searchable,
I'm golden. Sadly, that's way harder than it sounds, or, in my opinion,
than it should be.
I recently put together a workflow that is tenable, if time consuming.
If your interested in the details, I've shared them:
https://decuser.github.io/pdfs/2023/02/01/pdf-cleanup-workflow.html
In the note, I leverage a lot of great tools that have significantly
improved over the years to the point where they do a great job at what
they do. But, there's lots of room for improvement. Particularly in the
area of image tweaking around color and highlights and such.
The note is mac-centric in that I use a mac, otherwise, all of the tools
work on modern *nix and with a little abstract thought, windows too.
In my world, here's what happens:
* find a really interesting topic and along the way, collect pdfs to read
* open the pdf and find it salient, but not so readable, with sad
printability, and no or broken OCR
* I begin the process of making the pdf better with the aforementioned
goals aforethought
The process in a nutshell:
1. Extract the images to individual tiffs (so many tools can't work with
multi-image tiffs)
* pdfimages from poppler works great for this
2. Adjust the color (it seems impossible to do this without a batch
capable gui app)
* I use Photoscape X for this - just click batch and make
adjustments to all of the images using the same settings
3. Resize the images - most pdfs have super wonky sizes
* I use convert from imagemagick for this and I compress the tiffs
while I'm converting them
4. Recombine the images into a multi-tiff image
* I use tiffcp from libtiff for this
5. OCR the reworked image set
* I use tesseract for this - It's gotten so much better it's ridiculous
This process results in a pdf that meets the objectives.
It's not horribly difficult to do and it's not horribly time consuming.
It represents many, many attempts to figure out this thorny problem.
I'd really like to get away from needing Photoscape X, though. Then I
could entirely automate the workflow in bash...
The problem is that the image adjustments are the most critical - image
extraction, resize, compression, recombining images, ocr (I still can't
believe it), and outputting a pdf are now taken care of by command line
tools that work well.
I wouldn't mind using a gui to figure out some color setting (Grayscale,
Black and White, or Color) and increase/decrease values for shadows and
highlights if those could then be mapped to command line arguments of a
tool that could apply them, though. Cuz, then the workflow could be,
extract a good representative page as image, open it, figure out the
color settings, and then use those settings with toolY as part of the
scripted workflow.
Here are the objectives for easy reference:
1. The PDF needs to be readable on a decent monitor (zooming in doesn't
distort the readability, pixelation that is systematic is ok, but not
preferred). Yes, I know it's got a degree of subjectivity, but blobby,
bleeding text is out of scope!
2. The PDF needs to print with a minimum of artifact (weird shadows,
bleeding and blob are out). It needs to be easy to read.
3. The PDF needs to be searchable with good accuracy (generally, bad
scans have no ocr, or ocr that doesn't work).
Size is a consideration, but depends greatly on the value of the work.
My own calculus goes like this - if it's modern work, it should be way
under 30mbs. If it's print to pdf, it should be way under 10mb (remember
when you thought you'd never use 10mb of space... for all of your files
and the os). If it is significant and rare, less than 150mbs can work.
Obviously, this is totally subjective, your calculus is probably quite
different.
The reason this isn't posted over in pdf.scans.discussion is that even
if there were such a place, it'd be filled with super technical
gibberish about color depth and the perils of gamma radiation or
somesuch. We, as folks interested in preserving the past have a more
pragmatic need for a workable solution that is attainable to mortals.
So, with that as a bit of background, let me ask what I asked previously
in a different wayon tuhs, here in coff - what's your experience with
using sad pdfs? Do you just live with them as they are, or do you try to
fix them and how, or do you use a workflow and get good results?
Later,
Will
Oh, and of course I would cc the old address!
Reply on the correct COFF address <coff(a)tuhs.org>
Sheesh.
On 2/3/23 11:26 AM, Will Senn wrote:
> We're in COFF territory again. I am enjoying the conversation, but
> let's self monitor. Perhaps, a workflow for this is that when we drift
> off into non-unix history discussion, we cc: COFF and tell folks to
> continue there? As a test I cced it on this email, don't reply all to
> this list. Just let's talk about it over in coff. If you aren't on
> coff join it.
>
> If you aren't sure or think most folks on the list want to discuss it.
> Post it on COFF, if you don't get any traction, reference the COFF
> thread and tease it in TUHS.
>
> This isn't at all a gripe - I heart all of our discussions, but I
> agree that it's hard to keep it history related here with no outlet
> for tangential discussion - so, let's put coff to good use and try it
> for those related, but not quite discussions.
>
> Remember, don't reply to TUHS on this email :)!
>
> - will
>
> On 2/3/23 11:11 AM, Steve Nickolas wrote:
>> On Fri, 3 Feb 2023, Larry McVoy wrote:
>>
>>> Some things will never go away, like keep your fingers off of my L1
>>> cache lines. I think it's mostly lost because of huge memories, but
>>> one of the things I love about early Unix is how small everything was.
>>> Most people don't care, but if you want to go really fast, there is no
>>> replacement for small.
>>>
>>> Personally, I'm fine with some amount of "list about new systems where
>>> we can ask about history because that helps us build those new
>>> systems".
>>> Might be just me, I love systems discussions.
>>
>> I find a lot of my own stuff is like this - kindasorta fits and
>> kindasorta doesn't for similar reasons.
>>
>> (Since a lot of what I've been doing lately is creating a
>> SysV-flavored rewrite of Unix from my own perspective as a
>> 40-something who actually got most of my experience coding for
>> 16-bits and MS-DOS, and speaks fluent but non-native C. I'm sure it
>> comes out in my coding style.)
>>
>> -uso.
>
On Feb 3, 2023, at 8:26 AM, Will Senn <will.senn(a)gmail.com> wrote:
>
> I can't seem to get away from having to highlight and mark up the stuff I read. I love pdf's searchability of words, but not for quickly locating a section, or just browsing and studying them. I can flip pages much faster with paper than an ebook it seems :).
You can annotate, highlight and markup pdfs. There are apps for that though
I’m not very familiar with them as I don’t markup even paper copies. On an
iPad you can easily annotate pdfs with an apple pencil.
> From: Dennis Boone <drb(a)msu.edu>
>
> * Don't use JPEG 2000 and similar compression algorithms that try to
> re-use blocks of pixels from elsewhere in the document -- too many
> errors, and they're errors of the sort that can be critical. Even if
> the replacements use the correct code point, they're distracting as
> hell in a different font, size, etc.
I wondered about why certain images were the way they were, this
probably explains a lot.
> * OCR-under is good. I use `ocrmypdf`, which uses the Tesseract engine.
Thanks for the tips.
> * Bookmarks for pages / table of contents entries / etc are mandatory.
> Very few things make a scanned-doc PDF less useful than not being able
> to skip directly to a document indicated page.
I wish. This is a tough one. I generally sacrifice ditching the
bookmarks to make a better pdf. I need to look into extracting bookmarks
and if they can be re-added without getting all wonky.
> * I like to see at least 300 dpi.
Yes, me too, but I've found that this often results in too big (when
fixing existing), if I'm creating, they're fine.
> * Don't scan in color mode if the source material isn't color. Grey
> scale or even "line art" works fine in most cases. Using one pixel
> means you can use G4 compression for colorless pages.
Amen :).
>
> * Do reduce the color depth of pages that do contain color if you can.
> The resulting PDF can contain a mix of image types. I've worked with
> documents that did use color where four or eight colors were enough,
> and the whole document could be mapped to them. With care, you _can_
> force the scans down to two or three bits per pixel.
> * Do insert sensible metadata.
>
> * Do try to square up the inevitably crooked scans, clean up major
> floobydust and whatever crud around the edges isn't part of the paper,
> etc. Besides making the result more readable, it'll help the OCR. I
> never have any luck with automated page orientation tooling for some
> reason, so end up just doing this with Gimp.
Great points. Thanks.
-will
That was the title of a sidebar in Australia's "Silicon Chip" electronics
magazine this month, and referred to the alleged practice of running
scientific and engineering programs many times to ensure consistent
output, as hardware error checks weren't the best in those days (bit-flips
due to electrical noise etc).
Anyway, the mag is seeking corroboration on this (credit given where due,
of course); I find it a bit hard to believe that machines capable of
running complex programs did not have e.g. parity checking...
Thanks.
-- Dave
Will Senn wrote in
<3808a35c-2ee0-2081-4128-c8196b4732c0(a)gmail.com>:
|Well, I just read this as Rust is dead... here's hoping, but seriously,
|if we're gonna go off and have a language vs language discussion, I
|personally don't think we've had any real language innovation since
|Algol 60, well maybe lisp... sheesh, surely we're in COFF territory.
It has evangelists on all fronts. ..Yes it was only that while
i was writing the message i reread about Vala of the GNOME
project, which seems to be a modern language with many beneficial
properties still, growing out of a Swiss University (is that a bad
sign to come from Switzerland and more out of research), and it
had support of Ubuntu and many other parts of the GNOME project.
Still it is said to be dead. I scrubbed that part of my message.
But maybe thus a "dead" relation in between the lines remained.
Smalltalk is also such a thing, though not from Switzerland.
An Ach! on the mystery of human behaviour. Or, like the wonderful
Marcel Reich-Ranicki said, "Wir sehen es betroffen, den Vorhang
zu, und alle Fragen offen" ("Concerned we see, the curtain closed,
and all the Questions open").
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)
[TUHS to Bcc]
On Wed, Feb 1, 2023 at 2:11 PM Rich Salz <rich.salz(a)gmail.com> wrote:
> On Wed, Feb 1, 2023 at 1:33 PM segaloco via TUHS <tuhs(a)tuhs.org> wrote:
>> In the annals of UNIX gaming, have there ever been notable games that have operated as multiple processes, perhaps using formal IPC or even just pipes or shared files for communication between separate processes (games with networking notwithstanding)?
>
> https://www.unix.com/man-page/bsd/6/hunt/
> source at http://ftp.funet.fi/pub/unix/4.3bsd/reno/games/hunt/hunt/
Hunt was the one that I thought of immediately. We used to play that
on Suns and VAXen and it could be lively.
There were a number of such games, as Clem mentioned; others I
remember were xtrek, hearts, and various Chess and Go servers.
- Dan C.
Switching to COFF
On Wed, Feb 1, 2023 at 1:33 PM segaloco via TUHS <tuhs(a)tuhs.org> wrote:
> In the annals of UNIX gaming, have there ever been notable games that have
> operated as multiple processes, perhaps using formal IPC or even just pipes
> or shared files for communication between separate processes (games with
> networking notwithstanding)?
>
Yes - there were a number of them. Both for UNIX and other wise. Some
spanned the Arpanet back in the day on the PDP-10's. There was an early
first person shooter games that I remember that ran on the PDP-10s on
ADM3As and VT52 that worked that way. You flew into space and fought each
other.
CMU's (Steve Rubin's) Trip was stand alone program - sort of the
grand-daddy of the Star Trek games. It ran on a GDP2 (Triple-Drip Graphics
Wonder) and had dedicated 11/20. It was multiple processes to do
everything. You were at the Captions chair of the Enterprise looking out
into space. You had various mission and at some point would bee to
reprovision - which meant you had to dock at the 2001 space
station including timing your rotation to line up with docking bay like in
the movie. When you beat an alien ship you got a bottle of coke - all
of which collected in row on the bottom of the screen.
I did manage to save the (BLISS-11) sources to it a few years ago. One
of my dreams is to try to write GDP simulator for SIMH and see if we can
bring it back to life. A big issue as Rob knows is the GDPs had an amazing
keyboard so duplicating it will take some thinking with modern HW; but HW
has caught up such that I think it might be possible to emulate it. SIMH
works really well with a number of the other Graphics systems and with my
modem system like my current Mac and its graphics HW, there might be a
chance.
One of my other favorites was one that ran on the Xerox Alto's who's name I
don't remember, where you wandered around the Xerox 3M ethernet. People
would enter your system and appear on your system. IIRC Byte Magazine did
an article that talked about it at one point -- this was all pre-Apple Macs
- but I remember they had pictures of people playing it that I think they
took at Stanford. IIRC Shortly after the X-Terminals appeared somebody
tried to duplicate it, or maybe that was with the Bilts but it was not
quite as good as those of us that had access to real Xerox Altos.
ᐧ
COFF'd
> I think general software engineering knowledge and experience cannot be
> 'obsoleted' or made less relevant by better languages. If they help,
> great, but you have to do the other part too. As languages advance and
> get better at catching (certain kinds of) mistakes, I worry that
> engineers are not putting enough time into observation and understanding
> of how their programs actually work (or do not).
I think you nailed it there mentioning engineers in that one of the growing norms these days is making software development more accessible to a diverse set of backgrounds. No longer does a programming language have to just bridge the gap between, say, an expert mathematician and a compute device.
Now there are languages to allow UX designers to declaratively define interfaces, for data scientists to functionally define algorithms, and WYSIWYG editors for all sorts of things that were traditionally handled by hammering out code. The concern of describing a program through a standard language and the concern that language then describing the operations of a specific device have been growing more and more decoupled as time goes on, and that then puts a lot of the responsibility for "correctness" on those creating all these various languages.
Whatever concern an engineer originally had to some matter of memory safety, efficiency, concurrency, etc. is now being decided by some team working on the given language of the week, sometimes to great results, other times to disastrous ones. On the flip side, the person consuming the language or components then doesn't need to think about these things, which could go either way. If they're always going to work in this paradigm where they're offloading the concern of memory safety to their language architect of choice, then perhaps they're not shorting themselves any. However, they're then technically not seeing the big picture of what they're working on, which contributes to the diverse quality of software we have today.
Long story short, most people don't know how their programs work because they aren't really "their" programs so much as their assembly of a number of off-the-shelf or slightly tweaked components following the norms of whatever school of thought they may originate in (marketing, finance, graphic design, etc.). Sadly, this decoupling likely isn't going away, and we're only bound to see the percentage of "bad" software increase over time. That's the sort of change that over time leads to people then changing their opinions of what "bad software" is. Look at how many people gleefully accept the landscape of smart-device "apps"....
- Matt G.