|  | extract text from ods TableCell using odfpy |  | |
| | | frankentux |  |
| Posted: Mon Aug 25, 2008 8:29 pm Post subject: extract text from ods TableCell using odfpy |  |
| |  | |
Hi there,
I'm losing hair trying to figure out how I can actually get the text out of an existing .ods file. Currently I have: #!/usr/bin/python from odf.opendocument import Spreadsheet from odf.opendocument import load from odf.table import TableRow,TableCell from odf import text doc = load("/tmp/match_data.ods") d = doc.spreadsheet rows = d.getElementsByType(TableRow) for row in rows: cells = row.getElementsByType(TableCell) for cell in cells: print dir(cell.getElementsByType(text.P))
This is a spreadsheet containing 200 rows, each with 4 cells containing strings. What I'd like to be able to do is something like: for row in rows: cells = row.getElementsByType(TableCell) users.append((cells[0].value,cells[1].value,cells[2].value,cells[3].value))
Thus, what I'd like to know is how to actually get the value out of the cell. I've read through the odfpy api documentation (which is almost completely focused on writing, not reading) and googled for info, but I still haven't found anything. |
| |
| | | frankentux |  |
| Posted: Tue Aug 26, 2008 8:08 am Post subject: Re: extract text from ods TableCell using odfpy |  |
Ok. Sorted it out, but only after taking a round trip over xml.minidom. Here's the working code:
#!/usr/bin/python from odf.opendocument import Spreadsheet from odf.opendocument import load from odf.table import TableRow,TableCell from odf.text import P doc = load("/tmp/match_data.ods") d = doc.spreadsheet rows = d.getElementsByType(TableRow) for row in rows[:2]: cells = row.getElementsByType(TableCell) for cell in cells: tps = cell.getElementsByType(P) if len(tps) > 0: for x in tps: print x.firstChild |
| |
| | | norseman |  |
| Posted: Tue Aug 26, 2008 3:04 pm Post subject: Re: extract text from ods TableCell using odfpy |  |
frankentux wrote:
| Quote: | Ok. Sorted it out, but only after taking a round trip over xml.minidom. Here's the working code:
#!/usr/bin/python from odf.opendocument import Spreadsheet from odf.opendocument import load from odf.table import TableRow,TableCell from odf.text import P doc = load("/tmp/match_data.ods") d = doc.spreadsheet rows = d.getElementsByType(TableRow) for row in rows[:2]: cells = row.getElementsByType(TableCell) for cell in cells: tps = cell.getElementsByType(P) if len(tps) > 0: for x in tps: print x.firstChild -- LINK
========================= |
cd /opt find . -name "*odf*" -print (empty) cd /usr/local/lib/python2.5 find . -name "*odf*" -print (empty)
OK - where is it? :)
Steve norseman@hughes.net |
| |
| | | John Machin |  |
| Posted: Tue Aug 26, 2008 8:52 pm Post subject: Re: extract text from ods TableCell using odfpy |  |
On Aug 27, 3:04 am, norseman <norse...@hughes.net> wrote:
| Quote: | frankentux wrote: Ok. Sorted it out, but only after taking a round trip over xml.minidom. Here's the working code:
#!/usr/bin/python from odf.opendocument import Spreadsheet from odf.opendocument import load from odf.table import TableRow,TableCell from odf.text import P doc = load("/tmp/match_data.ods") d = doc.spreadsheet rows = d.getElementsByType(TableRow) for row in rows[:2]: cells = row.getElementsByType(TableCell) for cell in cells: tps = cell.getElementsByType(P) if len(tps) > 0: for x in tps: print x.firstChild -- LINK
========================= cd /opt find . -name "*odf*" -print (empty) cd /usr/local/lib/python2.5 find . -name "*odf*" -print (empty)
OK - where is it? :)
|
Consider using: find --http --google "odfpy"
 |
| |
| | | norseman |  |
| Posted: Tue Aug 26, 2008 9:55 pm Post subject: Re: extract text from ods TableCell using odfpy |  |
| |  | |
Ciaran Farrell wrote:
| Quote: | 2008/8/26 norseman <norseman@hughes.net>: frankentux wrote: Ok. Sorted it out, but only after taking a round trip over xml.minidom. Here's the working code:
#!/usr/bin/python from odf.opendocument import Spreadsheet from odf.opendocument import load from odf.table import TableRow,TableCell from odf.text import P doc = load("/tmp/match_data.ods") d = doc.spreadsheet rows = d.getElementsByType(TableRow) for row in rows[:2]: cells = row.getElementsByType(TableCell) for cell in cells: tps = cell.getElementsByType(P) if len(tps) > 0: for x in tps: print x.firstChild -- LINK
========================= cd /opt find . -name "*odf*" -print (empty) cd /usr/local/lib/python2.5 find . -name "*odf*" -print (empty)
OK - where is it? :)
Sorry. Stupid of me. The module is not part of the standard libary. It's at LINK
Ciaran
============== |
I got the download and all went pretty well. Setup.py compiled OK and install put it where it belongs.
As a test I went to try odflint and keep getting a zlib not found error. It is installed (/usr/local/lib) and the python zlib things .py, .pyc and .pyo all seem present. Not sure what is happening.
I took a look at Python.2.5.2's zipfile.py
statement: import zlib was changed to import libz as zlib (ALL libs are prefixed with lib... by convention) Problem below the test happens with or without my change.
Test I ran:
python (sign on yah de yah yah) import zipfile zipfile.is_zipfile("zx") False zipfile.is_zipfile("zz.zip") True zipfile.is_zipfile("zx.zip") False (file non existent - no error generated, but answer correct)
Thus all returned correct answers. Distro Python code runs as expected.
However:
odflint OOstuf2.odt |\__ python /usr/local/bin/odflint OOstuf2.odt |/ Both return following:
Traceback (most recent call last): File "/usr/local/bin/odflint", line 213, in <module> lint(sys.argv[1]) File "/usr/local/bin/odflint", line 197, in lint content = zfd.read(zi.filename) File "/usr/local/lib/python2.5/zipfile.py", line 498, in read "De-compression requires the (missing) zlib module" RuntimeError: De-compression requires the (missing) zlib module
Anybody: What did I miss correcting? Seems odflint only uses zipfile.references.
System: Slackware 10.2 on 2.4GgHz Laptop
Steve norseman@hughes.net |
| |
|
|