[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

OpenDivX digest, Vol 1 #63 - 3 msgs



Send OpenDivX mailing list submissions to
	[email protected]

To subscribe or unsubscribe via the World Wide Web, visit
	http://lists.projectmayo.com/mailman/listinfo/opendivx
or, via email, send a message with subject or body 'help' to
	[email protected]

You can reach the person managing the list at
	[email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of OpenDivX digest..."


Today's Topics:

   1. yuv2rgb_565 error (Eric Felten)
   2. Please Remove ([email protected])
   3. color conversion and decoder's perfomance: how to increase on ~10-15% (Andrew Voznytsa)

--__--__--

Message: 1
Date: Thu, 26 Apr 2001 10:35:49 +0200
From: Eric Felten <[email protected]>
To: [email protected]
Subject: [OpenDivX] yuv2rgb_565 error

Hello,

I am just working on my own player including the opendivx decore and
it's working fine :-)

Instead of doing the color conversion on my side, I've choosen to let
the decore doing it, simply by selecting the right ouput format
depending of the king of colordepth of the screen. But the result wasn't
acceptable in RGB565 because instead of having usual color output, I
received something like a black/blue output :-(

So I just tried to correct it directly in the assembler code, but first
by trying to understand the C code. The 565 means a 6 bits mask for the
green component, and 5 for both red and blue.

So instead of :
#define _Ps565(r,g,b) ( ((r & 0xF8) >> 3) | (((g & 0xF8) << 3)) | (((b &
0xF8) << 8)) )

it should be :
#define _Ps565(r,g,b) ( ((r & 0xF8) >> 3) | (((g & 0xFc) << 3)) | (((b &
0xF8) << 8)) )

the same failure was in the assembler code too !

Then some funny things were done like shifting red instead of blue etc
...
and finaly this mask :
pand mm4, mask_blue    ; mask the blue again
was not necessary ...

Here is the corrected version of the function (working fine on some
different computers) :

Best Regards

- Eric Felten -

void yuv2rgb_565(uint8_t *puc_y, int stride_y,
                uint8_t *puc_u, uint8_t *puc_v, int stride_uv,
                uint8_t *puc_out, int width_y, int height_y,
        unsigned int _stride_out)
{

 int y, horiz_count;
 unsigned short * pus_out;
 int stride_out = width_y * 2;

 if (height_y < 0) {
  /* we are flipping our output upside-down */
  height_y  = -height_y;
  puc_y     += (height_y   - 1) * stride_y ;
  puc_u     += (height_y/2 - 1) * stride_uv;
  puc_v     += (height_y/2 - 1) * stride_uv;
  stride_y  = -stride_y;
  stride_uv = -stride_uv;
 }
 pus_out = (unsigned short *) puc_out;

 horiz_count = -(width_y >> 3);

 for (y=0; y<height_y; y++)
 {
  /***
  for (x=0; x<width_y; x++)
  {
   signed int _r,_g,_b;
   signed int r, g, b;
   signed int y, u, v;

   y = puc_y[x] - 16;
   u = puc_u[x>>1] - 128;
   v = puc_v[x>>1] - 128;

   _r = _R(y,u,v);
   _g = _G(y,u,v);
   _b = _B(y,u,v);

   r = _S(_r);
   g = _S(_g);
   b = _S(_b);

   pus_out[0] = (unsigned short) _Ps565(r,g,b);

   pus_out++;
  }
  /***/

  _asm {
   push eax
   push ebx
   push ecx
   push edx
   push edi

   mov eax, puc_out
   mov ebx, puc_y
   mov ecx, puc_u
   mov edx, puc_v
   mov edi, horiz_count

  horiz_loop:

   // load data
   movd mm2, [ecx]    ; mm2 = ________u3u2u1u0
   movd mm3, [edx]    ; mm3 = ________v3v2v1v0
   movq mm0, [ebx]          ; mm0 = y7y6y5y4y3y2y1y0

   pxor mm7, mm7    ; zero mm7

   // convert chroma part
   punpcklbw mm2, mm7       ; mm2 = __u3__u2__u1__u0
   punpcklbw mm3, mm7       ; mm3 = __v3__v2__v1__v0
   psubw mm2, mmw_0x0080    ; mm2 -= 128
   psubw mm3, mmw_0x0080    ; mm3 -= 128
   psllw mm2, 3             ; mm2 *= 8
   psllw mm3, 3             ; mm3 *= 8
   movq mm4, mm2            ; mm4 = mm2 = u
   movq mm5, mm3            ; mm5 = mm3 = v
   pmulhw mm2, mmw_mult_U_G ; mm2 *= u green coeff
   pmulhw mm3, mmw_mult_V_G ; mm3 *= v green coeff
   pmulhw mm4, mmw_mult_U_B ; mm4 = blue chroma
   pmulhw mm5, mmw_mult_V_R ; mm5 = red chroma
   paddsw mm2, mm3    ; mm2 = green chroma

   // convert luma part
   psubusb mm0, mmb_0x10    ; mm0 -= 16
   movq mm1, mmw_0x00ff     ; mm1 = 00ff00ff00ff00ff
   psrlw mm0, 8             ; mm0 = __y7__y5__y3__y1 luma odd
   pand mm1, mm0            ; mm1 = __y6__y4__y2__y0 luma even
   psllw mm0, 3             ; mm0 *= 8
   psllw mm1, 3             ; mm1 *= 8
   pmulhw mm0, mmw_mult_Y   ; mm0 luma odd *= luma coeff
   pmulhw mm1, mmw_mult_Y   ; mm1 luma even *= luma coeff

   // complete the matrix calc with the addictions
   movq mm3, mm4    ; copy blue chroma
   movq mm6, mm5    ; copy red chroma
   movq mm7, mm2    ; copy green chroma
   paddsw mm3, mm0    ; mm3 = luma odd + blue chroma
   paddsw mm4, mm1    ; mm4 = luma even + blue chroma
   paddsw mm6, mm0    ; mm6 = luma odd + red chroma
   paddsw mm5, mm1    ; mm5 = luma even + red chroma
   paddsw mm7, mm0    ; mm7 = luma odd + green chroma
   paddsw mm2, mm1    ; mm2 = luma even + green chroma
   // clipping
   packuswb mm3, mm3
   packuswb mm4, mm4
   packuswb mm6, mm6
   packuswb mm5, mm5
   packuswb mm7, mm7
   packuswb mm2, mm2
   // interleave odd and even parts
   punpcklbw mm4, mm3   ; mm4 = b7b6b5b4b3b2b1b0 blue
   punpcklbw mm5, mm6   ; mm5 = r7r6r5r4r3r2r1r0 red
   punpcklbw mm2, mm7   ; mm2 = g7g6g5g4g3g2g1g0 green

   // mask not needed bits (using 565)
   pand mm5, mask_5   ; red & 0xF8
   pand mm2, mask_6   ; green & 0xFc
   pand mm4, mask_5   ; blue & 0xF8

   // mix colors and write
   psrlw mm4, 3    ; mm4 = blue shifted
   pxor mm7, mm7    ; zero mm7

   movq mm3, mm5    ; mm3 = copy red
   movq mm6, mm2    ; mm6 = copy green
   movq mm1, mm4    ; mm1 = copy blue

   punpckhbw mm3, mm7
   punpckhbw mm6, mm7
   punpckhbw mm1, mm7

   psllw mm6, 3    ; shift green
   psllw mm3, 8    ; shift red
   por mm6, mm3
   por mm6, mm1
   movq 8[eax], mm6

   punpcklbw mm2, mm7
   punpcklbw mm5, mm7
   punpcklbw mm4, mm7
   psllw mm2, 3    ; shift green
   psllw mm5, 8    ; shift red
   por mm2, mm5
   por mm2, mm4
   movq [eax], mm2

   add ebx, 8               ; puc_y   += 8;
   add ecx, 4               ; puc_u   += 4;
   add edx, 4               ; puc_v   += 4;
   add eax, 16              ; puc_out += 16 // wrote 16 bytes

   inc edi
   jne horiz_loop

   pop edi
   pop edx
   pop ecx
   pop ebx
   pop eax

   emms

  }
  /***/

  puc_y   += stride_y;
  if (y%2) {
   puc_u   += stride_uv;
   puc_v   += stride_uv;
  }
  puc_out += stride_out;
 }
}

/***/



--__--__--

Message: 2
From: [email protected]
Date: Fri, 27 Apr 2001 19:23:07 EDT
To: [email protected]
Subject: [OpenDivX] Please Remove


--part1_e3.13ec8c8d.281b595b_boundary
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit

Please remove my from the list.  Thanks.

Vice President Business Development
NewZing.com Inc
2465 E. Bayshore Rd
Suite 400
Palo Alto, CA 94301
610.212.6099 (mobile) 
[email protected]

--part1_e3.13ec8c8d.281b595b_boundary
Content-Type: text/html; charset="US-ASCII"
Content-Transfer-Encoding: 7bit

<HTML><FONT FACE=arial,helvetica><FONT  SIZE=2>Please remove my from the list. &nbsp;Thanks.
<BR>
<BR>Vice President Business Development
<BR>NewZing.com Inc
<BR>2465 E. Bayshore Rd
<BR>Suite 400
<BR>Palo Alto, CA 94301
<BR>610.212.6099 (mobile) 
<BR>[email protected]</FONT></HTML>

--part1_e3.13ec8c8d.281b595b_boundary--


--__--__--

Message: 3
Date: Sat, 28 Apr 2001 04:04:59 -0700 (PDT)
From: Andrew Voznytsa <[email protected]>
To: [email protected]
Subject: [OpenDivX] color conversion and decoder's perfomance: how to increase on ~10-15%

Hi All,

As I see decoder filter now supports IOverlay,
Overlay mixer supports Y41P pixel format on
some(Radeon for example) VGAs.
Y41P is some sort of YV12/I420...

below is explanation of format:

(from MSDN)
MEDIASYBTYPE_Y41P

Y41P format data. A packed YUV format. A Y sample at
every pixel, a U and V
sample at every fourth pixel horizontally on each
line; every vertical line
sampled. Byte ordering (lowest first) is U0, Y0, V0,
Y1, U4, Y2, V4, Y3, Y4,
Y5, Y6, Y7, where the suffix 0 is the leftmost pixel
and increasing numbers
are pixels increasing left to right. Each 12-byte
block is 8 image pixels.

as anyone could see conversion from YUV 4:2:0 to Y41P
is very simple.. good chance to improve perfomance of
decore..

AV







__________________________________________________
Do You Yahoo!?
Yahoo! Auctions - buy the things you want at great prices
http://auctions.yahoo.com/



--__--__--

_______________________________________________
OpenDivX mailing list
[email protected]
http://lists.projectmayo.com/mailman/listinfo/opendivx


End of OpenDivX Digest


Reply To Poster

Local References / HOW-TO / FAQs