Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to correctly handle the situation where different parts of text within a single cell have different styles #1653

Open
uganh opened this issue Oct 23, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@uganh
Copy link

uganh commented Oct 23, 2024

EPPlus usage

Noncommercial use

Environment

Windows

Epplus version

7.4.0

Spreadsheet application

Excel

Description

hero_improve.xlsx

I have some strange files that need to be processed, where different parts of text in some cells have different styles.

f9f457d1-5882-442d-a26e-f422d8e532f8

In hero_improve.xlsx/xl/sharedStrings.xml, it appears as a si element mixed with multiple t and r elements.

178bb5f4-4bef-48af-9d99-c049ce7ef24b

Reference code lines:

XmlReaderHelper.ReadUntil(xr, "t", "r", "rPh", "phoneticPr");

It seems there is no way to get the full content of the cell.

using (var package = new ExcelPackage("hero_improve.xlsx"))
{
	ExcelWorkbook workbook = package.Workbook;
	ExcelWorksheet worksheet = workbook.Worksheets[0];

	var cell = worksheet.Cells["C17"];

	// Assert failed: `cell.Text` and `cell.Value` are both empty string
	Debug.Assert(cell.Text == "{\"any_orange_frag\" : 700, \"any_purple_frag\" : 800,\"any_blue_frag\" : 1200, \"any_green_frag\":2500}");
}
@uganh uganh added the bug Something isn't working label Oct 23, 2024
@JanKallman
Copy link
Contributor

You have an empty element in the beginning of all your si-elements in the shared strings table.
For example:

<si>
<t/>
<r>
<rPr>
<sz val="9.75"/>
<color rgb="FF000000"/>
<rFont val="Calibri"/>
<family val="2"/>
</rPr>
<t>{"any_</t>
</r>
<r>
<rPr>
<sz val="10.5"/>
<color rgb="FF000000"/>
<rFont val="Calibri"/>
<family val="2"/>
</rPr>
<t xml:space="preserve">purple_frag":200, "any_blue_frag" : 200, "any_green_frag" : 500}</t>
</r>
</si>
<si>
<t/>
<r>
<rPr>
<sz val="9.75"/>
<color rgb="FF000000"/>
<rFont val="Calibri"/>
<family val="2"/>
</rPr>
<t>{"any</t>
</r>
<r>
<rPr>
<sz val="10.5"/>
<color rgb="FF000000"/>
<rFont val="Calibri"/>
<family val="2"/>
</rPr>
<t>_orange_frag</t>
</r>
<r>
<rPr>
<sz val="9.75"/>
<color rgb="FF000000"/>
<rFont val="Calibri"/>
<family val="2"/>
</rPr>
<t xml:space="preserve">" : 700</t>
</r>
<r>
<rPr>
<sz val="10.5"/>
<color rgb="FF000000"/>
<rFont val="Calibri"/>
<family val="2"/>
</rPr>
<t xml:space="preserve">, "any_purple_frag" : 800,"any_blue_frag" : 1200, "any_green_frag":2500</t>
</r>
<r>
<rPr>
<sz val="9.75"/>
<color rgb="FF000000"/>
<rFont val="Calibri"/>
<family val="2"/>
</rPr>
<t>}</t>
</r>
</si>

Resaving this file in Excel causes this element to be removed.
EPPlus currently reads either the t-element (Text) or the list or r-elements (Rich Text).
It seems the standard supports having a t-element and then a list of r element, but I have never seen this before.
We will look at a fix for this in a coming version. For now, if possible, remove the empty t-element.

JanKallman added a commit that referenced this issue Nov 18, 2024
* Delete method for ole objects.

* Update Microsoft.Extensions.Configuration.Json to 8.0.1 - #1623

* Feature/new default font (#1621)

* Tests for new default font

* wip

* Added scale factors for Aptos Display and Aptos Narrow

* Updated test project file

* Removed invalid test

---------

Co-authored-by: swmal <[email protected]>

* EPPlus version 7.4.1

* Removed fixed references

* Added version information in history list

* Copy ole object and tests

* #1624 - fix for preserving rich data on save (#1625)

* #1624 - fix for preserving rich data on save

* Removed uncommented code

---------

Co-authored-by: swmal <{ID}+username}@users.noreply.github.com>

* Fixed some issues with copy ole object.

* Copy ole object complete.

* Fixed issue with opening ms office documents.

* Fixed issue opening copes of ms office docs.

* Uisng Picture store for ole emf images

* Copying ole objects through worksheet copy

* ole worksheet copy progress

* Bug/i1626 (#1636)

* Added fix for #1626

* Ensured two-anchor and other images are copied properly

* Added fixed issues

* Added minor feature for #1632 (#1637)

* Bug/s745 (#1644)

* Located one problem area

* Added comparisons and fixed incorrect flags

* Added fixed issues

* Fixed some issues with linked ole objects and finished worksheet copy.

* Fixes issue #1645 (#1648)

* Fixes issue #1646 (#1647)

* Added fix for #1628. Corrected ids on deleting tables (#1634)

modified:   EPPlusTest/Issues/LegacyTests/Issues.cs

* Bug/i1631 Row/ColumnOffsets (#1633)

* Fix for #1631 column/row offset in shared formulas

* Added fixed issues note

* add testmethod

* added Ole Object xml comments, stream, fileinfo, name, arg object

* Stream and fileinfo tests.

* Fixes array formula calculation for single cells containing range operations (#1657)

* Fixes array formula calculation for single cell arrays. #1649

* Range operations using conditional operators are now identified as dynamic array formulas

* Replaced ApplyWithDynamicResult method with Apply

* Minor changes to Update of table array property

* Fixed issue #1653 and enabled update of the Workbook.MaxFontWidth property. (#1654)

* EPPlus version 7.4.2

* Many small changes.

* Fixed some tests

* moved Emf check

* Fixed errors and fixed correct TextMetrics

---------

Co-authored-by: JanKallman <[email protected]>
Co-authored-by: swmal <[email protected]>
Co-authored-by: swmal <{ID}+username}@users.noreply.github.com>
Co-authored-by: OssianEPPlus <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants
@JanKallman @uganh and others