Menu

#9 Unicode issues

closed-fixed
None
5
2003-06-29
2003-06-25
Roman Suzi
No

While trying to format some koi8-r encoded docs using
docutils (specifically html.py -i koi8-r -o cp1251
f.txt f.html)
I got tracebacks of this nature:

1) when I have unresolved reference `with national
characters inside`_ -
I am getting traceback in utils.py at msg.astext() calls
( repr(msg.astext()) helps, but makes output unreadable)
The error prevents document from processing to the end.

2) I keep getting this:

UnicodeEncodeError: 'ascii' codec can't encode
characters in position 0-6: ordinal not in range(128)
in html4css1 module here:
parts.append('%s="%s"' % (name.lower(),

(this happens when prcessing :alt:-attributes with
national characters insode)

Coercing second element of the tuple to unicode helps
(in both branches)

I am using Python 2.3b1 and docutils 0.3 on Linux
RedHat 7.3 system.

I think, docutil tests must include unittests with
non-ASCII encoded characters in all possible places
(like in ALT-tags)

Discussion

  • David Goodger

    David Goodger - 2003-06-26
    • assigned_to: nobody --> goodger
     
  • David Goodger

    David Goodger - 2003-06-26

    Logged In: YES
    user_id=7733

    > 1) when I have unresolved reference `with national
    > characters inside`_ -
    > I am getting traceback in utils.py at msg.astext() calls
    > ( repr(msg.astext()) helps, but makes output unreadable)
    > The error prevents document from processing to the end.

    Could you provide (as file attachments) a minimal offending
    document
    and the actual traceback? Also, please attach the output of
    "html.py
    -i koi8-r -o cp12351 --dump-settings good.txt good.html" where
    "good.txt" is a file that doesn't cause a traceback. Thank you.

    > 2) I keep getting this:
    >
    > UnicodeEncodeError: 'ascii' codec can't encode
    > characters in position 0-6: ordinal not in range(128)
    > in html4css1 module here:
    > parts.append('%s="%s"' % (name.lower(),

    I have just checked in a fix to the html4css1 module; please
    try it
    (from CVS or snapshot:
    http://docutils.sf.net/docutils-snapshot.tgz\).

    > I think, docutil tests must include unittests with
    > non-ASCII encoded characters in all possible places
    > (like in ALT-tags)

    Patches are welcome!

     
  • Roman Suzi

    Roman Suzi - 2003-06-26

    Nearly minimal offending file

     
  • Roman Suzi

    Roman Suzi - 2003-06-27
    • status: open --> closed
     
  • David Goodger

    David Goodger - 2003-06-27

    Logged In: YES
    user_id=7733

    Did the fix resolve problem 2?

    What was the resolution of problem 1?

     
  • Roman Suzi

    Roman Suzi - 2003-06-29
    • labels: 369283 -->
    • milestone: 156138 -->
     
  • Roman Suzi

    Roman Suzi - 2003-06-29

    Logged In: YES
    user_id=287815

    This bug shows in current docutils no more - that is why I
    closed it.

    I am deleting file erroneously attached here.

    I have attached --dump-settings
    but they probably not the same as when I observed the bug
    as I now have current version of docutils.

    This is what I am getting if there is no input file (does it
    really
    need to spew out a traceback?)

    Traceback (most recent call last):
    File "/usr/local/bin/html.py", line 25, in ?
    publish_cmdline(writer_name='html', description=description)
    File
    "/usr/local/lib/python2.3/site-packages/docutils/core.py",
    line 239, in publish_cmdline
    enable_exit=enable_exit)
    File
    "/usr/local/lib/python2.3/site-packages/docutils/core.py",
    line 167, in publish
    self.set_io()
    File
    "/usr/local/lib/python2.3/site-packages/docutils/core.py",
    line 125, in set_io
    self.set_source(source_path=source_path)
    File
    "/usr/local/lib/python2.3/site-packages/docutils/core.py",
    line 136, in set_source
    encoding=self.settings.input_encoding)
    File
    "/usr/local/lib/python2.3/site-packages/docutils/io.py",
    line 150, in __init__
    self.source = open(source_path)
    IOError: [Errno 2] No such file or directory: 'good.txt'

    And here are settings:

    html.py -i koi8-r -o cp1251 --dump-settings good.txt good.html

    ::: Runtime settings:
    {'_destination': 'good.html',
    '_disable_config': None,
    '_source': 'good.txt',
    'attribution': 'dash',
    'compact_lists': 1,
    'datestamp': None,
    'debug': None,
    'docinfo_xform': 1,
    'doctitle_xform': 1,
    'dump_internals': None,
    'dump_pseudo_xml': None,
    'dump_settings': True,
    'dump_transforms': None,
    'embed_stylesheet': None,
    'error_encoding': 'ascii',
    'error_encoding_error_handler': 'backslashreplace',
    'exit_level': 5,
    'expose_internals': None,
    'footnote_backlinks': 1,
    'footnote_references': 'superscript',
    'generator': None,
    'halt_level': 4,
    'input_encoding': 'koi8-r',
    'language_code': 'en',
    'output_encoding': 'cp1251',
    'output_encoding_error_handler': 'strict',
    'pep_references': None,
    'report_level': 2,
    'rfc_references': None,
    'source_link': None,
    'source_url': None,
    'stylesheet': 'default.css',
    'stylesheet_path': None,
    'tab_width': 8,
    'toc_backlinks': 'entry',
    'trim_footnote_reference_space': None,
    'warning_stream': None,
    'xml_declaration': 1}

     
  • David Goodger

    David Goodger - 2003-06-29
    • status: closed --> closed-fixed
     
  • David Goodger

    David Goodger - 2003-06-29

    Logged In: YES
    user_id=7733

    > This is what I am getting if there is no input file

    Of course; if there's no such file, it will generate
    an exception. I asked you to *make* a small file
    that *doesn't* cause a traceback.

    > (does it really need to spew out a traceback?)

    I suppose it ought to be caught. Patches welcome!

     

Log in to post a comment.