An a-bit-silly/futuristic idea I had today...

João Jerónimo · **Posted:** Mon Sep 18, 2006 4:46 pm

There is a guy in this forum that suggested a 3D gui for an operating system (like a 3D game!)... I have a similar idea...
My idea could eliminate many of the current flexibility limitations of the graphical user interfaces...

You know that Windows deamons that automatically execute aplications when the microfone "hears" their name? The objective would be extend this to all the operating system control...

Processes would comunicate with the user through a terminal (just like in good-classic UNIX design) but (and now diferently from UNIX, I think), the stdin and stdout "pipes" would lead directly to the kernel...

The kernel would obtain user input by whatever means it wants/can (ideally by translating data came from the microfone to text)... the user input would be normal sentences (yes, sentences!), and the kernel would syntatically analize them and pass structures (which members would be, for example, sentence_type, verb, direct object, indirect object) to user apps... the user apps would then interpret the verb as being a call to some function and the direct/indirect object it's arguments!

The kernel would be also responsable for directing input to the right terminals, by interpreting vocatives...

Of course in the first versions such an operating system could include a keyboard interface to the terminals only...

U: Notepad
NP: Yes...
U: Write this "There's a sucker".
U: Delete the previous word.
U: Write this in bold: "potential customer"
U: Write this: "born every minute."
U: Kernel!
K: Yes my lord...
U: Kill that stupid program!
K: The program is holding a resource at the moment and cann...
U: KILL THAT PROCESS RIGHT NOW!
K: Oh yes, of course my lord!

JJ

Candamir · **Posted:** Mon Sep 18, 2006 5:57 pm

Instead of the user directly communicating with the kernel, I would rather have some kind of task manager (like the one in Windows or Ubuntu). Also, I think that an OS needs a GUI even if it is speech-based... You could call a program and the Window of that program would receive the focus... So, the kernel would interpret and convert every spoken message to text and then decide whether the message is meant for the kernel or if it shall pass the text on to the program.

Candamir

Candy · **Posted:** Mon Sep 18, 2006 11:21 pm

I can imagine that being an idea for a lot of people. I however, can type more accurate than I can speak. You will include a keyboard interface as well, won't you?

Also, having a word fight with your kernel over what it should and should not do? kill -9 is clear for me

Brendan · **Posted:** Tue Sep 19, 2006 12:42 am

Hi,

Another example:

U: Givus sum nopad
NP: Huh
U: I wanna write a letta
NP: A letta?
U: A letta for me, see?
NP: Let us format C? Ok, formatting..

Speech recognition alone is hard enough (even though there has been some impressive advances in this area - you can probably do a patent search to find details). AFAIK the current "state of the art" for Windows systems is a product called NaturallySpeaking. Here's someone else's comments...

The next problem is parsing the english language and forming sensible english language responses. To date, this has been a major stumbing block. This is party because english is unstructured and ambiguous, and partly because it relies on context a lot. For e.g.:

U: Make the title bold
NP: Which title
U: The main title
NP: The main title?
U: Make it bold
NP: Make what bold?
U: Make the main title bold
NP: Ok - changing main title to "Bold"
U: Undo
NP: Ok
U: Use bold lettering for the main title
NP: Ok - changing main title to "Bold Lettering"

To solve these problems you need to make the language structured - like a programming language rather than normal speech:

U: Select main title
NP: Ok
U: Enable bold
NP: Ok

This would involve having unique names for everything that can be selected or changed (icons, scroll bars, buttons, etc), which wuld mean forcing all applications to provide suitable names for everything.

Lastly, if you get everything working perfectly people will play with it for 10 minutes, say "That was cool" and then disable it. Why? Because it takes much longer to say something and get an audible response than it does to click or type, home users like to listen to music or TV (which would be picked up by the microphone) and office users would go insane with all the chatter.

It would be a useful tool for disabled people though - for e.g. blind people. Also, if you do create the necessary "GUI language" and make sure all applications respond to it correctly then it would be useful for a typed scripting language (like batch files in DOS that are capable of telling applications what to do instead of just controlling the shell).

Cheers,

Brendan

Pype.Clicker · **Posted:** Tue Sep 19, 2006 1:41 am

heh. ever wondered why only the captain Kirk give vocal orders on the bridge while Mr. Scotty and Dr. Spok and the lady-in-short-skirt just type on their console ?

There's a vocal interface on MacOSX. You can have your mail by saying "get my mail, please". Note that you have to say it in english (they may have other language supports, now), and that my friend had to pinch his nose a bit to find the appropriate tone.

There are a few environment where vocal interface may be interesting, such as in emergency section in a hospital, where everyone is buzzing around and noone has a free hand for the computer. Yet, it takes years of prototyping just to find the words that the practician could use to order a blood analysis.

On the other hand, i hardly imagine an architect using a voice interface to work on house plans... or a coder speaking debugger statements ...
If you like the idea of dictating your letters, you'll probably prefer to have a small device you can grab around with you, push a button and record something, push another button to hear it back, maybe clear it, then record the sentence again.

And then, maybe, you'd use a typist program to retreive those "wave" files and make them text you can paste and arrange in a word processor.

Combuster · **Posted:** Tue Sep 19, 2006 5:14 am

Voice recognition seems to be a form of input that's not widely used at all, but it has more advantages than just being able to code while lying back in my chair - you can communicate "wireless" with it that way. For instance i could be sitting in the other end of the room and have my computer respond when i insult him for playing "Its a small world" again.

As for the debugger, it'd be a useful thing if you're debugging fullscreen apps.

Nevertheless, itd be rather difficult to implement a lot of universal control on existing OSes as they are not built to be controlled by voice. To allow decent and clean communication it should be taken care of in all programs from the start on, which would be an awful lot.
But of course, you can go write rootkits for windows that do your bidding, if you like debugging.

Jo?o Jer?nimo wrote:

There is a guy in this forum that suggested a 3D gui for an operating system (like a 3D game!)...

I see you like the idea

Pype.Clicker · **Posted:** Tue Sep 19, 2006 5:24 am

Combuster wrote:

For instance i could be sitting in the other end of the room and have my computer respond when i insult him for playing "Its a small world" again.

Maybe that's something that could be funny in a domotic expo/showroom, indeed. but honnestly, when i want to skip a song on my HiFi, i just use the remote control. I do not need to insult the HiFi and having it to reply "sorry, my lord".

And i'm not even sure i'd be happy of a computer that starts playing a tune i'm humming to myself, if you ask (though that'd be definitely ?ber-geekly-cool)

Quote:

As for the debugger, it'd be a useful thing if you're debugging fullscreen apps.

I can tell you have never debugged over a serial line, have you? would it _really_ be useful to hear Marvin's voice repeating "the values for eax, ebx, ecx and edx are 0xcafebabe 0xdeadbeef 0x12345678 0x87654321" and waiting for you to say "next .. next .. next .. next .. next ... " (the horror)

João Jerónimo · **Posted:** Tue Sep 19, 2006 12:27 pm

By Candamir:

Quote:

Instead of the user directly communicating with the kernel, I would rather have some kind of task manager (like the one in Windows or Ubuntu).

That's a possibility, in fact...
But if you can communicate directly with the kernel, why not?
But of course it's a design decision... just like being a microkernel or a monolithic kernel...

Quote:

Also, I think that an OS needs a GUI even if it is speech-based... You could call a program and the Window of that program would receive the focus...

I have thought exactly about that thing of focus...
If the aplication wanted to display some text, it could simply create a "window" with a text-box widget...
Imagine I ask a make-like program to build some program:

U: Execute make, kernel. (suppose the kernel automatically
K: Ok focuses the window)
U: Build /path/to/source/Makefile
M: Ok. Building

Then the program could have an append-only textbox in it's window, cause it's not a bad idea to register error/warning compiler messages more persistently than what happens when someone sais something...
Also, the "Ok. Building" message could be shown in tha append-only textbox, too...

Quote:

So, the kernel would interpret and convert every spoken message to text and then decide whether the message is meant for the kernel or if it shall pass the text on to the program.

Yes... The kernel could define a default destination for sentences that would store the last vocative introduced alone in a whole sentence... if you want to bypass it, you could just say/write your message with a vocative...

João Jerónimo · **Posted:** Tue Sep 19, 2006 12:30 pm

Quotes by Brendan

Quote:

Another example:

U: Givus sum nopad
NP: Huh
U: I wanna write a letta
NP: A letta?
U: A letta for me, see?
NP: Let us format C? Ok, formatting...

LOL... Looks like a deafs' conversation! I don't know if the current Speech recognition tecnologies make this type of mistakes, but it's funny!
If you are really afraid of that, an Esperanto speech recogniser could be implemented and everyone forced to use an esperanto-like pronounciation (which is VERY regular and is not difficult to use after some practise)...

Apart from that, I can only say that the code that generated the text would make it's *best* to find out how the user would write it's orders (including by comparing the words' sound with common ones' and so), with no fear of consuming much time or so...

Quote:

Speech recognition alone is hard enough (even though there has been some impressive advances in this area - you can probably do a patent search to find details). AFAIK the current "state of the art" for Windows systems is a product called NaturallySpeaking. Here's someone else's comments...

I'll take a look at that...

Quote:

The next problem is parsing the english language and forming sensible english language responses. To date, this has been a major stumbing block. This is party because english is unstructured and ambiguous, and partly because it relies on context a lot.

It's the same in every language...

Quote:

For e.g.:

U: Make the title bold
NP: Which title
U: The main title
NP: The main title?
U: Make it bold
NP: Make what bold?
U: Make the main title bold
NP: Ok - changing main title to "Bold"
U: Undo
NP: Ok
U: Use bold lettering for the main title
NP: Ok - changing main title to "Bold Lettering"

Either the user is disallowed to use this type of ambiguous sentences or the kernel would do just like human beings do: store the last noun explicitly specified and feed the sentence to the program with the original pronoun replaced by it...

Quote:

To solve these problems you need to make the language structured - like a programming language rather than normal speech:

U: Select main title
NP: Ok
U: Enable bold
NP: Ok

You could use a less "technical" language even if it's not free-form normal speech... you don't really need to select text to bold it, just consider the following example:

U: Apply Bold to the main title.

Here, the aplication, that is expected to identify nouns such as "main title" as objects and adjectives like "bold" as attributes, and the verb "apply" as an assigment verb, would make *all the words of the main title* bold...

It knew perfectly that the main title is a set of words, and that words are sets of letters... so it would make all the words bold, and consequentially all the letters bold...

Quote:

This would involve having unique names for everything that can be selected or changed (icons, scroll bars, buttons, etc), which wuld mean forcing all applications to provide suitable names for everything.

Yes, of course... and if they had a widget libraries, like in any "normal" operating system, they could have some functions to help identifying the objects that are being called...

Quote:

Lastly, if you get everything working perfectly people will play with it for 10 minutes, say "That was cool" and then disable it.

Not if it's really useful...

Quote:

Why? Because it takes much longer to say something and get an audible response than it does to click or type,

No! I'm much more speedy when I speak than when I write... You aren't?

And your response could be some GUI event or something like that... If you are dictating text, it would appear on the aplication's screen/window (which would not disapear as an UI object)...

Quote:

home users like to listen to music or TV (which would be picked up by the microphone)

The OS could be designed to allow also a "classic" GUI-only interface too...
But the immediate solution would be typing in the sentences...

JJ

João Jerónimo · **Posted:** Tue Sep 19, 2006 12:31 pm

Quotes by Candy:

Quote:

I, however, can type more accurate than I can speak. You will include a keyboard interface as well, won't you?

Yes, of course...

Quote:

Also, having a word fight with your kernel over what it should and should not do? kill -9 is clear for me

The word fight was a joke, of course!
But parhaps not totally... if the converter found that the user shouted, that information could be transmitted to the aplication... for example, if we loudly asked the kernel to kill some aplication, it could be understood as forcing the process termination rather than sending it a signal or letting it finalize any system calls...

JJ

João Jerónimo · **Posted:** Tue Sep 19, 2006 12:32 pm

Quotes by Combuster:

Quote:

Nevertheless, itd be rather difficult to implement a lot of universal control on existing OSes as they are not built to be controlled by voice. To allow decent and clean communication it should be taken care of in all programs from the start on, which would be an awful lot.

Of course... The graphical user interface could be redesigned to be much simpler, bacause you don't need to have it's input components... only output...

Quote:

There is a guy in this forum that suggested a 3D gui for an operating system (like a 3D game!)...

I see you like the idea :-)

Yes... I've recently read some threads about the lack of flexibility of the graphical user interfaces and, while I was thinking about it, I concluded that the problem with GUIs is that they are graphical but they don't emulate the environment that humans normally use when they work graphically (this is, they emulate a desktop instead of a *world*, do you understand?)...

JJ

Pype.Clicker · **Posted:** Wed Sep 20, 2006 1:03 am

Jo?o Jer?nimo wrote:

I concluded that the problem with GUIs is that they are graphical but they don't emulate the environment that humans normally use when they work graphically (this is, they emulate a desktop instead of a *world*, do you understand?)...

;D reminds me of those sketches i made of "Clicker Junior" (in '95, in case you wonder) where you had several "places" (or room, or whatever) where you could have things stored and that you interact with like an point-and-click adventure game (you know: monkey island, maniac mansion, etc.)

Still,
1. in real world, the shape of an object and its use are usually interdependent. Not in a computer. If i have a great code editor written, should i still focus on how it should look like in the 3D world and how much it bounces where it is dropped ?
2. locating an item in realworld is painful. It's much preferable to search a document by typing keywords or tags than by staring stacks of paper scattered all among the virtual room, if you ask me...

Kemp · **Posted:** Wed Sep 20, 2006 3:40 am

Agreed. In fact one of the most well-known problems in usability originates from developers attempting to provide real-world metaphors for things. The computer isn't the real world and a lot of the time the attempt to parallel it makes applications harder to use than they would be if their own function was considered before everything else.

Also, ACK for pype in relation to speaking vs. typing. I can do things much faster given a keyboard and mouse (or just a keyboard depending what app I'm in) than I can speaking. For instance, hitting Ctrl+F and typing a word is much faster for me than saying "Find <word>".

João Jerónimo · **Posted:** Wed Sep 20, 2006 5:51 pm

Pype.Clicker wrote:

Jo?o Jer?nimo wrote:

I concluded that the problem with GUIs is that they are graphical but they don't emulate the environment that humans normally use when they work graphically (this is, they emulate a desktop instead of a *world*, do you understand?)...

(...)

2. locating an item in realworld is painful. It's much preferable to search a document by typing keywords or tags than by staring stacks of paper scattered all among the virtual room, if you ask me...

Yes... I agree!
If not that, no one would ever convert paper databases to informatic ones...

Yes... about the 3D environment I concluded that, even if that type of environment could be more flexible than a desktop-emulation GUI (it could, really), it failed to be a propper solution for the usability problem and, as every program needs to be easy to use (given that the user learned how to use it, of course), it fails to provide a suitable solution...

Any interface can by flexible it it works the some way as something that is already flexible... and the same for usability!

Real world is flexible, right? So, if you want to make a graphical user interface flexible, a solution would be making it emulate the real world, creating a 3D space where you could walk, pick-up things, etc, etc
It's possible to do *everything* in the real world... but there are many things that are very difficult to do in the real world... so, a 3D real-world emulation GUI would not be easy to use...

One way one can both easily and flexibly have things done is giving an order to a slave! Traditionally, the computer equivalent of that is a command-line interface which, because of my universal law :-)

, is *very* flexible and reasonably easy to use...
Why is it flexible and easy to use? Because the user acts like the master giving orders to the slave... the master can ask the slave everything it wants, in the form it wants and can tell him to give the result to some other slave in order for him to do some other thing with it!
In spite of being easy to use, it has the disadvantage of not being very intuitive, cause you need to know what the commands you have to type in order to do something are... there is no concept of learning while you use a CLI tool, as you have always to consult either the handbook, the man pages, or the --help option before knowing how to do something...
Also, the fact that the text shell languages are more or less interpreted programming languages makes CLI shells a bit harder to learn than a slave that speaks english or some other human language (like portuguese, for exemple)! It also makes them impossible to be read, thus not allowing us to pipe a speech recognising program to the shell's stdin...

Now, about normal GUIs... they (and this applys to CLIs too)... they can be or not be easy to use, and can be more flexible or less flexible, depending on how they are designed... but it's much more simpler to make a CLI flexible than a GUI... Guess why...

JJ

Cheery · **Posted:** Thu Sep 21, 2006 1:24 am

Four directional view only for information where you must represent volumetric data! It's been horrible to watch those "3D" desktops.

If you really want a proper UI, remove that desktop&windows -concept completely and replace it with displays&viewports.

What I think about input methods? Well. multi-touch display would be cool. Properly working voice control with some artificial language would be a neat thing for that. A flexible input data pattern matching library over that and we have the best of the best.

keyboard and mouse... hmm. if I am asked about it. I think it is working input method but slow and clumsy.

OSDev.org

An a-bit-silly/futuristic idea I had today...

Who is online