A special question on computer vision

I'm not a programmer. I was interested in the topic of computer vision, and when I started studying information on it, it turned out that computer vision does not work, and it seems that there are no robots because of this.

I have an idea, I tried to do something about this issue, but so far nothing has worked out. The information is scattered, and there is no overall picture. Maybe someone can tell you about the source where there is the most complete information on this issue? In particular, I am interested in the general algorithm by which an object is detected:

1. which image is used (2d or 3d image construction),

2. whether the image is processed or not after receiving it from the environment before recognition, and

3. how exactly the object is defined, I only understood that according to the templates with resource-intensive AI training.

I would like to understand if what I want to do is suitable for what science has come to?

I have an idea of what the picture should be: there is a scene or you can do video) with a color filter, some idea of colors, there is an approximate idea of what happens: a picture of objects that need to be processed and an approximate idea of the processing method, but I can't figure out whether it will fit or not

As far as I understand, I am interested in pattern recognition. If there is a specialist, I would talk to him.


It is supposed to use a 2d image scene: an image of an inverted figure of melting ice (icicles) on a blue background, a yellow background with a white grid is superimposed on the image

Blue is the most complex color, yellow filters it for clarity, black is a simple color.

color recognition is assumed in accordance with the template by counting the number of shades of the object's spot and calculating the weighted average harmonic

I conducted an experiment with a distorted blue inscription (on a tube): I took a photo without a yellow filter and with a yellow filter, very similar photos turned out, then I uploaded both photos to an online translator and as a result, what was translated (and recognized) - with a filter, what was translated incorrectly - without a filter - the record was recognized incorrectly

I can't show the file with screenshots, because the inscription will be an advertisement, I can throw it off personally