Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

atdgen cannot print large json files on 32bit architectures #56

Open
mjambon opened this issue May 27, 2018 · 5 comments
Open

atdgen cannot print large json files on 32bit architectures #56

mjambon opened this issue May 27, 2018 · 5 comments

Comments

@mjambon
Copy link
Member

mjambon commented May 27, 2018

From @mjambon on May 27, 2018 1:36

From @josch on March 20, 2015 11:11

Hi,

when running my code on 32bit architectures, then for big output I'm getting the following traceback:

Fatal error: exception Failure("oops")
Raised at file "pervasives.ml", line 20, characters 22-33
Called from file "write.ml", line 52, characters 2-24
Called from file "datatypes_j.ml", line 52, characters 4-47
Called from file "ag_oj_run.ml", line 19, characters 1-6
Called from file "ag_oj_run.ml", line 43, characters 2-39
Called from file "ag_oj_run.ml", line 19, characters 1-6
Called from file "ag_oj_run.ml", line 43, characters 2-39
Called from file "ag_oj_run.ml", line 19, characters 1-6
Called from file "ag_oj_run.ml", line 43, characters 2-39
Called from file "ag_oj_run.ml", line 25, characters 1-6
Called from file "ag_oj_run.ml", line 43, characters 2-39
Called from file "datatypes_j.ml", line 2324, characters 4-62
Called from file "datatypes_j.ml", line 2332, characters 2-18

I am running Debian unstable and have the following versions installed:

  • libatdgen-ocaml-dev 1.3.1-1+b1
  • libyojson-ocaml-dev 1.1.8-1
  • libbiniou-ocaml-dev 1.0.9-1

I was told in the #debian-ocaml IRC channel that this problem might be caused by 32bit architectures not supporting very large strings (>= 8MB) and I observe the problem for instances that would create a 53MB json file on a 64bit platform.

I'm also unsure where exactly the problem should be fixed. Maybe this is actually a biniou problem as that library uses a mutable string as a buffer internally?

Copied from original issue: mjambon/atdgen#29

Copied from original issue: ahrefs/atd#103

@mjambon
Copy link
Member Author

mjambon commented May 27, 2018

What are the values for start and len printed on stderr before that?
See https://github.com/mjambon/yojson/blob/master/write.ml#L31

It's probably a size limit problem; it's possible that your string cannot be added to the buffer, which itself is a string. The maximum length on any given string is 16MB on 32-bit platforms. Usual workarounds include using a 64-bit platform, breaking large data into chunks (using a wrapper, see http://stackoverflow.com/a/28997459/597517), or not using json for large data but a streaming protocol of some kind (which breaks the data into chunks for you).

@mjambon
Copy link
Member Author

mjambon commented May 27, 2018

From @josch on March 20, 2015 18:46

Hi,
Indeed there is something else above the Exception message. I did not notice this because it contained part of the output that I expected to end up in my json file.

Could you make it such that this information ends up as part of the raised exception? Just being told Fatal error: exception Failure("oops") and a print above which is not obviously connected to the exception is not very helpful but instead very confusing.

In my case, that message had start=0 len=13 but there are others. The package failed to build on eight architectures and you can have a look at the start/len values of all arches here: https://buildd.debian.org/status/logs.php?pkg=botch&ver=0.7-1~experimental1&suite=experimental after clicking on the Maybe-Failed links which will get you to the build log for each architecture.

Do you deem this to be worth fixing by using a different data structure than a string for the buffer?

Breaking the input data into chunks will be an ugly workaround because it is hard to impossible to predict how much space a given piece of input data will take when converted into json before doing the actual conversion.

Using a 64-bit platform could be a workaround but so would using a datastructure as a buffer which is not an ocaml string. Then every platform could work with large data.

@mjambon
Copy link
Member Author

mjambon commented May 27, 2018

I'm not sure what the problem is, it could be a bug in yojson. I'll take a look a bit later.

@mjambon
Copy link
Member Author

mjambon commented May 27, 2018

From @josch on March 20, 2015 23:32

To clarify the problem, let me give you a minimal example.

  • Makefile:
test.native: datatypes_j.ml datatypes_t.ml test.ml
    ocamlbuild -classic-display -use-ocamlfind -package atdgen test.native

%_j.ml: %.atd
    atdgen -j -j-std $<

%_t.ml: %.atd
    atdgen -t $<

.PHONY: clean
clean:
    rm -f datatypes_j.ml datatypes_j.mli datatypes.ml datatypes.mli datatypes_t.ml datatypes_t.mli test.native
    rm -rf _build
  • test.ml
let main () =
  let s = String.make (int_of_string Sys.argv.(1)) 'c' in
  print_endline (Datatypes_j.string_of_output (s, s));
;;

main ();;
  • datatypes.atd
type output = (string * string)

Compile the whole thing by running make and then execute:

$ ./test.native 8388602

This will work fine. But then:

$ ./test.native 8388603
Fatal error: exception Invalid_argument("Buf.extend: reached Sys.max_string_length")

Nice readable error message. The problem starts here:

$ ./test.native 8388604
[lots of output]
Fatal error: exception Failure("oops")

This error message is not really helpful and it would be great if this would not happen on 32 bits.

Another solution to the problem could be to not buffer the result in the first place but to allow passing an output channel to which any generated json would then directly be written to.

@mjambon
Copy link
Member Author

mjambon commented May 27, 2018

Proposed resolution: include all useful information in the exception and don't print anything to stderr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant