Changelog
All notable changes to Dangl.TextConverter are documented here.
v3.0.4:
- The System.Text.Encoding.CodePagesdependency was downgraded again to4.5.0, to fix dependency incompatibilities in earlier .NET versions
v3.0.3:
- Dropped tests for .NET 6 and added tests for .NET 8
- When exporting plain text to RTF, there's now a space after the initial {\rtf1control word
v3.0.2:
- Updated internal dependencies
v3.0.1:
- Support \lineas line-break control character when reading RTF
v3.0.0:
- The ANTLR dependencies were updated from Antlr.Runtime to Antlr.Runtime.Standard. The new package is the now official ANTLR runtime package and includes many performance improvements
- Compatibility for net40andnetstandard1.3was dropped, the lowest supported frameworks now arenet45andnetstandard2.0
v2.1.0:
- Dropped tests for net5.0and added tests fornet7.0
- Whitespaces in spantexts are now better preserved when converting from Html to plain text
v2.0.0:
- RTF texts now support images. They can be read from RTF and also exported. Conversion between RTF and Html will also preserve images. For some operations, images can only be preserved if they are converted to another format, e.g. importing DIB (Device Independent Bitmaps) from RTF requires you to supply an IRtfImageConverter, similar when exporting a GIF image to RTF which also requires a conversion. To extract RTF texts from images, you must use theRtfToText.ConvertRtfToSegmentedTextmethod
- A new package Dangl.TextConverter.RtfImageConversion has been published, which contains converters for both GIF and DIB formats
- HtmlAgilityPack was updated from 1.11.7to1.11.43, which now treats sequences like<br></br>as two instances of thebrtag instead of a single one. This behavior is actually correct, but customers that rely on the old behavior can check the source code examples for how to configure it
- Segmented RTF text now may also return a new type RtfImageSegment. Client code should be updated to expect this new type
- The TextByClasstype now also has a propertyImageNode, which will be populated if Html text is split by class name and an image node is encountered. Please be aware that image nodes will always lead to a split node and not have any other properties set
- Dropped tests for .NET Core 3.1 and added tests for .NET 6.0
v1.3.9:
- Fixed a bug where escape sequences terminated with a semicolon ;were not properly converted from RTF to plain text
- Dropped tests for netcoreapp2.1and added tests fornet5.0
v1.3.8:
- Fixed a bug where HtmlToText.ConvertHtmlToPlaintextAndSplitByClassnameremoved empty segments at the beginning or the end, even if the segments were matching a class name
v1.3.7:
- Fixed a bug where the runtime complexity of the HtmlTableConverterwas rising very fast for deeply nested tables
v1.3.6:
- Add net40as target framework
v1.3.5:
- Deserializing RTF text now recognizes escape sequences in decimal format, e.g. \u252is deserialized asΓΌ
v1.3.4:
- The NuGet package now does specify the MIT license
- Dropped tests for netcoreapp2.2and added tests fornetcoreapp3.1
- The PackageIconelement is now used for the NuGet package instead of the deprecatedPackageIconUrl
v1.3.3:
- Deserializing escaped unicode strings now ignores non printable characters
v1.3.2:
- Bugfix when deserializing incomplete RTF texts. When an incomplete RTF string was given, this could sometimes lead to a NullReferenceException
v1.3.1:
- Bugfix in HtmlToText.ConvertHtmlToPlaintextAndSplitByClassnamewhich could throw an exception if the Html had no plain text representation, e.g. for image only Html
v1.3.0:
- The generated assemblies now have a strong name. This is a breaking change of the binary API and will require recompilation on all systems that consume this package. The strong name of the generated assembly allows compatibility with other, signed tools. Please note that this does not increase security or provide tamper-proof binaries, as the key is available in the source code per Microsoft guidelines
v1.2.9:
- The class Dangl.TextConverter.Html.HtmlTableConverteris now public
- The InternalsVisibleTooattribute for the assembly was removed to prevent conflicts in applications that sign their binaries
v1.2.8:
- Bugfix where bookmark elements in Rtf texts with missing closing elements where throwing a System.InvalidOperationException
- CI tests are now also run on Linux
- Bugfix where line endings were sometimes not correctly trimmed when running on .NET Core on Linux
v1.2.7:
- Non-printable unicode escape sequences (0x00-0x08,0x0band0x0e-0x1f) are now ignored when converting RTF text to plain text
v1.2.6:
- Add the HtmlAgilityPackLegacyBehaviorHelperutility class and fix an issue where enabling the legacy behavior in HtmlAgilityPack was not thread safe and could fail when simultaneously accessed
v1.2.5:
- Update HtmlAgilityPack dependency. The previously referenced version 1.9.2was pulled from NuGet due to an unintended, breaking API change. Please see https://github.com/zzzprojects/html-agility-pack/issues/125 for more information about the change
- The HtmlToTextclass now sets the static propertyHtmlAgilityPack.HtmlDocument.DisableBehavaiorTagP = falsein its static method calls to ensure compatible behavior. If your own code relies on different behavior, please ensure that this property is always set to its original value after invoking one of the methods onHtmlToText. See the README for further details
v1.2.4:
- Dependencies update
v1.2.3.:
- Small internal refactoring
- Dependencies update
v1.2.2:
- Add bool keepWhitespaceAtLineEndsparameter toTextToHtml.TransformPlaintextToHtml()overload which defaults to false
- Bugfix where Rtf text was sometimes incorrectly read and output when segments between groups started with whitespace
v1.2.1:
- Add StringLineStartNormalizationExtensions
v1.2.0:
- Update HtmlAgilityPackfor huge (about 10x) performance improvements innetstandardtargets, see https://github.com/zzzprojects/html-agility-pack/releases/tag/v1.8.11
- When converting Html to plain text, it's now possible to have the result split by Html class names. Please see the README or https://docs.dangl-it.com/Projects/Dangl.TextConverter for further details
- public static SegmentedRtf ConvertRtfToSegmentedText(string rtfInput)was added to- RtfToText. This will return text segments that contain plain text representations of the texts as well as tags to indicate the opening and closing of bookmarks. This is used, for example, in the GAEB & AVA .Net Libraries by DanglIT to work with text additions in GAEB 2000 files. Please see the README or https://docs.dangl-it.com/Projects/Dangl.TextConverter for further details
- Added TextToRtf.ConvertPlainTextToRtf(SegmentedRtf segmentedRtf)to convert back to Rtf from segmented texts while preserving bookmarks
- Dropped tests for netcoreapp2.0, added tests fornetcoreapp2.2
v1.1.5:
- Bugfix: Some empty tables caused a NullReferenceExceptionwhen converting them to plaintext via theHtmlToTextclass
- Update of HtmlAgilityPackandSystem.Text.Encoding.CodePages(the latter only fornetstandardtargets)
v1.1.4:
- Dependencies update
v1.1.3:
- Update of HtmlAgilityPack dependency to include latest bugfixes
- Internal refactoring of the CI/CD pipeline
v1.1.2:
- Small performance improvements for parsing Rtf text
v1.1.1:
- Update HtmlAgilityPack to latest stable version 1.8.4
v1.1.0:
- Switch to HtmlAgilityPack. The HtmlAgilityPack.Core fork is no longer required since the original supports now netstandard
v1.0.8
- Add netstandard2.0target
- Switch build system to NUKE
v1.0.7
- Fix bug in RtfToText where Rtf annotations were read as plain text
v1.0.6
- Performance improvements for Rtf texts that contain pictures
v1.0.5
- Update Html encoding/decoding to preserve correctness in roundtrip scenarios
v1.0.4
- Downgrade to netstandard1.3 and net45 for broader compatibility
v1.0.3
- Performance improvements
v1.0.2
- Update ANTLR4 dependencies to latest stable version
v1.0.1
- Target NETStandard 1.3