Formatted text in IFF FTXT

This forum is for general developer support questions.
User avatar
trixie
Posts: 409
Joined: Thu Jun 30, 2011 2:54 pm
Location: Czech Republic

Formatted text in IFF FTXT

Post by trixie »

AmigaOS's clipboard uses the IFF FTXT format to exchange text between programs. This is simple and well-documented in the RKRM. However, in most (all?) applications, clipboard support is limited to the exchange of plain (unformatted) text. In the future, Amiga programs will want to do better.

IFF FTXT does support formatted text, in two ways:
  • through the font specifier chunk, FONS
  • This allows for giving font name and type (proportional/unproportional, serif/sans-serif), and is quite clearly described in the IFF docs.
  • through control sequences
  • These are stored as part of the CHRS chunk and should describe font selection, text size, and text style (bold/italic/underline), following a CSI (Control Sequence Introducer) character (hex 9B).
The use of control sequences is quite underdocumented. I haven't managed to find a code example of how to use them in IFF FTXT; neither have I found a list of control sequences that have been adopted for use in AmigaOS. The IFF FTXT doc refers to the ISO/ANSI standards but doesn't say more. Indeed, ANSI provides a list of CSI codes but it is absolutely unclear to me which can be used in FTXT, and how. I'm afraid it has never been subject to any kind of standardization in AmigaOS.

OS devs, any word on this? Considering that IFF FTXT is probably going to stay as the format for clipboard text, perhaps we could do something about it now?
The Rear Window blog

AmigaOne X5000 @ 2GHz / 4GB RAM / Radeon RX 560 / ESI Juli@ / AmigaOS 4.1 Final Edition
SAM440ep-flex @ 667MHz / 1GB RAM / Radeon 9250 / AmigaOS 4.1 Final Edition
User avatar
tonyw
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 1479
Joined: Wed Mar 09, 2011 1:36 pm
Location: Sydney, Australia

Re: Formatted text in IFF FTXT

Post by tonyw »

Hi Trixie,

The first question is: Are there any apps that can Cut or Copy text (or any other objects) to the clipboard and pack the data into the proper chunks? As you say, all our current apps use only raw ASCII data.

The second question is: Are there any apps that can use packed data? AFAIK the only application that can accept pasted data with control sequences is the con-handler/console combination. Even that can only use the data unpacked from a hunk such as you are describing.

How do we maintain compatibility if we start migrating to a chunk-based Clipboard? We already have provision for several Clipboard Units, Clip 0, Clip 1, etc. We could say that IFF clips would always be Clip 9, say, then leave Clip 0 (the default) in place for legacy apps. A new app reading the Clipboard would check Clip 9 first, and if there is nothing there, then fall back to Clip 0 for its input. When writing, it could write both a raw version to Clip 0 and a packed version to Clip 9.

Interesting design exercise, but a "back burner" job at the moment, I think.
cheers
tony
User avatar
trixie
Posts: 409
Joined: Thu Jun 30, 2011 2:54 pm
Location: Czech Republic

Re: Formatted text in IFF FTXT

Post by trixie »

@tonyw

No, it's not a "design exercise" - the design has already been made: the Amiga clipboard uses the IFF format for its data. This is probably not going to change, and I'm not advocating any changes here.

We wouldn't need to "migrate to a chunk-based clipboard" - what do you mean by saying that? The Amiga clipboard already is chunk-based: whenever you read/write data from/to the clipboard, you work with chunks all along. Even the "raw ASCII data" (as you say) currently read/written by Amiga applications must be retrieved from/stored in a CHRS chunk, or a number of CHRS chunks.

An application with a minimalistic clipboard support (= most current clipboard-aware apps) goes through the data looking for all CHRS chunks (and ignoring other chunks in the process). So if another application - a new word processor or a web browser - decides to store clipboard data including the font specification (the FONS chunk), there's absolutely no breech in compatibility because other apps will simply ignore FONS.

Yes I can see a problem with the control sequences stored as part of CHRS because many programs probably took the easy way and didn't implement any stripping as per the IFF specification. So if a new application stores formatting data via control sequences, an older app might paste the text including the "nonsense" control characters. But that doesn't mean that we are breaking compatibility - it's the offending apps that didn't follow the IFF specification and don't process the CHRS chunks as they should. That's how I see it, at least.
The Rear Window blog

AmigaOne X5000 @ 2GHz / 4GB RAM / Radeon RX 560 / ESI Juli@ / AmigaOS 4.1 Final Edition
SAM440ep-flex @ 667MHz / 1GB RAM / Radeon 9250 / AmigaOS 4.1 Final Edition
User avatar
nbache
Beta Tester
Beta Tester
Posts: 1714
Joined: Mon Dec 20, 2010 7:25 pm
Location: Copenhagen, Denmark
Contact:

Re: Formatted text in IFF FTXT

Post by nbache »

trixie wrote:[*]through control sequences[/*]
These are stored as part of the CHRS chunk and should describe font selection, text size, and text style (bold/italic/underline), following a CSI (Control Sequence Introducer) character (hex 9B).
[/list]
The use of control sequences is quite underdocumented. I haven't managed to find a code example of how to use them in IFF FTXT; neither have I found a list of control sequences that have been adopted for use in AmigaOS.
I believe there is a comprehensive list of control codes, including CSI sequences, in one of the RKRMs; probably the Libraries and Devices one? (For some reason, the page number 750 sits in the back of my head when trying to recall this, which was something I used to refer to quite frequently years ago - but no guarantees for the validity of that particular piece of my memory ;-).)

Best regards,

Niels
User avatar
tonyw
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 1479
Joined: Wed Mar 09, 2011 1:36 pm
Location: Sydney, Australia

Re: Formatted text in IFF FTXT

Post by tonyw »

Trixie,

You are quite right. I've never looked in the CLIPS: drawer and didn't know what the format was. My experience with clips has been limited to the console and con-handler, which handle raw ASCII. I've never followed it through to see how it gets from one to the other. That was all done by others years ago and I've just used the existing code.

Ah well, new things to learn...
cheers
tony
User avatar
trixie
Posts: 409
Joined: Thu Jun 30, 2011 2:54 pm
Location: Czech Republic

Re: Formatted text in IFF FTXT

Post by trixie »

@ Niels

I seem to have found it, it's in the RKRM Devices manual, Chapter 4 (Console device). It lists the ANSI control sequences for writing into the console window, and it looks like the list includes the CSI sequences the IFF specification refers to.

@ whoever is interested

I've done a bit of research, and I'll summarize my findings and opinions so that we have a point of departure:

1. Amiga applications (word processors, DTP programs, web browsers) never felt the need to support formatted text in clipboard data exchanges. They only cared about copying/pasting formatted text within the program: they stored plain text in the public clipboard and used an internal, proprietary mechanism to retain the formatting info. But you cannot copy a formatted text from WordWorth and paste it in PageStream (or vice-versa): the formatting will be gone.

2. What I am trying to say here in this thread is that such a feature (= inter-application formatted text exchange) would be nice to have - even more so in an age of the Internet, when there's so much text accessible from the browser. At the same time, I'm saying that such a feature can be introduced without changing the inner workings of the Amiga clipboard. The clipboard uses the IFF FTXT format to store text data, and the IFF specification (1985, updated 1988) quite clearly says that the FTXT format is meant to store
text that has explicit formatting information (or “looks”) such as font family and size, typeface, etc. /.../ Character looks are stored as embedded control sequences within CHRS chunks.
3. Unfortunately, the specification admits being somewhat incomplete on this:
This document specifies which class of control sequences to use: the CSI group. This document does not yet specify their meanings, e.g., which one means “turn on italic face”. Consult ISO/ANSI.
Referring to the ISO/ANSI control codes list, we'll find that there indeed is a CSI sequence called SGR (Select Graphic Rendition) for text properties. About three dozen optional parameters are specified for SGR. Of these, the above-mentioned RKRM Chaper 4 specifies seventeen parameters that are relevant to the console output. These includes parameters like "italic on", "underline off" etc.

4. The IFF FTXT specification also introduces the optional property chunk FONS, the font specifier, which
assigns a font to a numbered “font register” so it can be referenced by number within subsequent CHRS chunks. /.../ The font specifier gives both a name and a description for the font so the recipient program can do font substitution. By default, CHRS text uses font 1 until it selects another font.
Although the IFF specification does not mention how font selection should be done for the text chunks, the ISO/ANSI control codes offer a logical solution. Indeed, the SGR sequence has parameters 10-19 to select from ten pre-defined fonts. This of course introduces an inherent limitation: only ten FONS chunks would be meaningful within an IFF FTXT file. But I don't see that as a major drawback. What represents a bigger problem is that neither the FONS data structure, neither the ISO/ANSI control sequence codes offer a parameter for specifying text size.

5. To sum up: in order to implement inter-application formatted-text exchange through the Amiga clipboard, we have most things already in place. The clipboard uses the IFF FTXT format, and the format specification does allow storing text properties: though the CSI sequences and the FONS chunk. Further, the specification clearly says that "new optional property chunks may be defined in the future to store additional formatting information."

What will need to be resolved is this:
  • IFF FTXT does not support storing text size.
  • If application programmers ever start using formatted text as per the FTXT specification, other programmers can no longer assume the clipboard to only include plain text data. Therefore, they'll need to implement proper control-code stripping (again, as per the FTXT specification) in their clipboard handling routines. (Let's note here that the assumption of the clipboard only including plain text has always been wrong and illegal.)
EDIT: Of course (taking into account that 1/ Commodore or EA can no longer update the IFF FTXT format for our needs, and 2/ formatted text has never really been used for clipboard data exchange on the Amiga), we could get rid of the entire CSI voodoo (which doesn't look very convenient anyway) and take a completely different approach. I think now that Hyperion has rights to the RKRM documentation and can update it for OS4, they can also update the IFF FTXT standard to something that would be more useful. To maintain maximum compatibility with older software (and to ease the life of developers), the best solution would be to separate the text from its formatting. What I mean: the CHRS chunk would only contain plain text, and a special chunk preceding or following the CHRS would contain respective formatting description. This way any application could store formatted text in the clipboard: applications not supporting/not interested in the formatting would read plain text data from the CHRS; and applications that do support formatted text would read from the formatting-description chunk.

I'd like to hear other developers' opinions on this.
The Rear Window blog

AmigaOne X5000 @ 2GHz / 4GB RAM / Radeon RX 560 / ESI Juli@ / AmigaOS 4.1 Final Edition
SAM440ep-flex @ 667MHz / 1GB RAM / Radeon 9250 / AmigaOS 4.1 Final Edition
chris
Posts: 562
Joined: Sat Jun 18, 2011 11:05 am
Contact:

Re: Formatted text in IFF FTXT

Post by chris »

trixie wrote: To maintain maximum compatibility with older software (and to ease the life of developers), the best solution would be to separate the text from its formatting. What I mean: the CHRS chunk would only contain plain text, and a special chunk preceding or following the CHRS would contain respective formatting description. This way any application could store formatted text in the clipboard: applications not supporting/not interested in the formatting would read plain text data from the CHRS; and applications that do support formatted text would read from the formatting-description chunk.

I'd like to hear other developers' opinions on this.
Sounds good to me. I've just tested the one thing that was mentioned as supporting CSI codes (the Shell) and it doesn't write them to the clipboard. There's a close to zero chance that any other application does.

I'd absolutely agree with creating a new text format chunk (TXTF?) with a defined structure that probably contains a struct TextAttr and two ULONGs for 32-bit foreground and background colours. I'm not sure what else you'd need in it, although the ability to copy and paste tables would be very useful (copying and pasting from/to spreadsheets would finally be possible). That would probably require a different chunk though, eg:
struct TABL {
int tabl_type; // eg TABL_NEWTABLE, TABL_ENDTABLE, TABL_NEWCOL, TABL_NEWROW
int tabl_span; // span this number of columns or rows (see flags)
int tabl_flags; // eg TABLF_SPANROWS, TABLF_SPANCOLS, TABLF_BORDERLESS
// maybe you need a ULONG for cell background colour too
}

Legacy applications would ignore this new chunk and get untabled text the same as before.

We could even push it further and support embedding ILBMs into FTXT, I have a vague recollection that embedding a FORM into another FORM is within the IFF specification anyway.

The other thing that is an FTXT issue is charsets other than the current local charset. We have the CSET chunk, which is fine, but any application that doesn't recognise it doesn't know to convert CHRS if necessary. It could be that clipboard.device could be a bit intelligent and do dynamic charset conversion, but you'd need to be able to tell it to disable that when writing an application that needs to support UTF-8 text on the clipboard, and I don't think clipboard.device takes notice of what data it gets anyway.

Maybe I'm drifting off the point a bit here.
User avatar
nbache
Beta Tester
Beta Tester
Posts: 1714
Joined: Mon Dec 20, 2010 7:25 pm
Location: Copenhagen, Denmark
Contact:

Re: Formatted text in IFF FTXT

Post by nbache »

trixie wrote:Although the IFF specification does not mention how font selection should be done for the text chunks, the ISO/ANSI control codes offer a logical solution. Indeed, the SGR sequence has parameters 10-19 to select from ten pre-defined fonts. This of course introduces an inherent limitation: only ten FONS chunks would be meaningful within an IFF FTXT file. But I don't see that as a major drawback.
Anybody using more than ten different fonts in one document deserves a slap anyway ;-). But seriously: Couldn't you just insert new FONS chuncks, re-using the reference numbers, later in the file whenever a new font was needed, and another one not used any more? That would mean the file had to be always read and interpreted sequentially, but isn't that normally the case anyway?

Best regards,

Niels
User avatar
tonyw
AmigaOS Core Developer
AmigaOS Core Developer
Posts: 1479
Joined: Wed Mar 09, 2011 1:36 pm
Location: Sydney, Australia

Re: Formatted text in IFF FTXT

Post by tonyw »

There has been a groundswell of opinion in recent years that we should get away from the IFF "chunk" format of storing data. These days we are being encouraged to use XML. New Prefs editors are storing their data as XML, for instance.

The question must be asked: What do we stand to gain by implementing the IFF as a data interchange format within AmigaOS? Only internal compatibility. If, on the other hand, we threw away IFF and went for XML, we would have compatibility not only internally, but with Windows, Linux and other platforms as well.

The RKRMs you found contain pretty old console documentation. The documentation released for V53.1 console was updated some years ago and contains more data. Of course, the new generation console supports a lot more ANSI compatibility. However, it does not attempt to keep formatting information in data it writes to the Clipboard. After all, it (the console device) is the only app that supports ANSI control characters anyway. Once the formatting codes have been used to format the text, they are discarded. Only the resulting character attributes are kept (and those only for the display). It would necessitate quite a large redesign of the console to keep incoming formatting information and spit it out again when something is "Copied" to the Clipboard.
cheers
tony
chris
Posts: 562
Joined: Sat Jun 18, 2011 11:05 am
Contact:

Re: Formatted text in IFF FTXT

Post by chris »

tonyw wrote:There has been a groundswell of opinion in recent years that we should get away from the IFF "chunk" format of storing data. These days we are being encouraged to use XML. New Prefs editors are storing their data as XML, for instance.
Ugh, no. XML is horrid, and not convenient for storing binary data (or, I'd argue, any data at all, but that's just my opinion).

This article is fantastic, please read it: http://www.ibm.com/developerworks/power ... pa-spec16/
The question must be asked: What do we stand to gain by implementing the IFF as a data interchange format within AmigaOS? Only internal compatibility. If, on the other hand, we threw away IFF and went for XML, we would have compatibility not only internally, but with Windows, Linux and other platforms as well.
IFF is already a data interchange format within AmigaOS. When are we going to be sharing the clipboard data with Linux and Windows? Answer: we're not. And, if we were, there would need to be some additional layer of software required that could put the clipboard in the right format regardless.

What do we stand to lose by abandoning IFF for clipboard data? Complete interoperability not only with old software, but with new software written for OS3. There's nothing that can't be solved by XML that could more easily be solved by adding new IFF chunks to the existing formats or, in the worst case, creating a new FORM.
It would necessitate quite a large redesign of the console to keep incoming formatting information and spit it out again when something is "Copied" to the Clipboard.
Not really necessary, and if it was it would also be necessary whether we were using IFF or XML on the clipboard.
Post Reply