Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird kernel panic #15

Open
cycl0ne opened this issue Nov 22, 2024 · 16 comments
Open

Weird kernel panic #15

cycl0ne opened this issue Nov 22, 2024 · 16 comments
Assignees

Comments

@cycl0ne
Copy link

cycl0ne commented Nov 22, 2024

First of all thanks for your code. Its really nice and i hope you finde time to dig more into it :-)

What i noticed while studying your code + initialisation of GDT/IDT/TSS is, correct me if wrong, but GDT is shared over all CPUs. But you GDT only holds one TSS. So when you call:
cpu_init() in your SMP code you just fill out the struct for the cpu, but the original tss_load() in the gdt_load_tss() only initalised one on the BSP.
Maybe this could come to an Panic later since all cpu share the same tss = same IRQ Stack, same Kernel Stack, same io permission.

Since im new, i could be wrong, just using your code (and others) to understand x86-64 kernel coding.

@cycl0ne
Copy link
Author

cycl0ne commented Nov 22, 2024

oh and a suggestion for the fix, you have this cpu struct, why dont you put the tss things here:
old:

typedef struct
{
    uint8_t id;
    uint8_t lapicId;
    uint64_t trapDepth;
    uint64_t prevFlags;
    uint64_t cliAmount;
    tss_t tss;
    sched_context_t sched;
    ipi_queue_t queue;
    uint8_t idleStack[CPU_IDLE_STACK_SIZE];
} cpu_t;

new:

typedef struct
{
    uint8_t id;
    uint8_t lapicId;
    uint64_t trapDepth;
    uint64_t prevFlags;
    uint64_t cliAmount;
    tss_t tss;
    sched_context_t sched;
    ipi_queue_t queue;
    uint8_t idleStack[CPU_IDLE_STACK_SIZE];
    uint8_t     kernel_stack[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
    uint8_t     ist_stacks[IST_STACK_COUNT][PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
} cpu_t;

and later init it per cpu like this:

    tss_t *cpu_tss = &cpu_data->cpu_tss;
    tss_descriptor_t *tss_desc;
    uint64_t tss_base = (uint64_t)cpu_tss;

    // Setup IO permission bitmap
    cpu_tss->iopb_offset = sizeof(tss_t);
    // Setup kernel stack
    cpu_tss->rsp0 = (uint64_t)cpu_data->kernel_stack[0] + PAGE_SIZE;

    // Setup ISTs
    cpu_tss->ist[0] = (uint64_t)cpu_data->ist_stacks[0] + PAGE_SIZE;
    cpu_tss->ist[1] = (uint64_t)cpu_data->ist_stacks[1] + PAGE_SIZE;
    cpu_tss->ist[2] = (uint64_t)cpu_data->ist_stacks[2] + PAGE_SIZE;
    cpu_tss->ist[3] = (uint64_t)cpu_data->ist_stacks[3] + PAGE_SIZE;
    // for (int i = 0; i < IST_STACK_COUNT; i++) {
    //     printf("IST stack %d: %lx\n", i, &cpu_data->ist_stacks[i]);
    // }
    tss_desc = (tss_descriptor_t *)((char*)gdt + (GDT_TSS_LOW + cpu * 16));

    // Setup TSS descriptor
    tss_desc->limit_low = sizeof(struct tss) & 0xFFFF;
    tss_desc->base_low = tss_base & 0xFFFF;
    tss_desc->base_middle = (tss_base >> 16) & 0xFF;
    tss_desc->flags1 = 0x89;   // Present, Type (TSS)
    tss_desc->flags2 = 0x00;   // Long-mode TSS
    tss_desc->base_high = (tss_base >> 24) & 0xFF;
    tss_desc->base_upper = (tss_base >> 32) & 0xFFFFFFFF;
    tss_desc->reserved = 0;    
    load_tss(GDT_TSS_LOW + cpu * 16);

for this to work, you need to allocate of course more GDTP space to hold all CPU TSS ..

@KaiNorberg
Copy link
Owner

Thank you for reporting this. However, I think there maybe some confusion that comes from the statement, "But you GDT only holds one TSS." The GDT is a very weird structure in many ways, so the confusion is understandable.

Let me explain. The GDT does not actually hold the TSS instead it holds a descriptor of the TSS bassicly just a little bit of info saying where the TSS is and some other stuff about it, it does not actually hold the TSS itself (I might rename the GDT member in order to highlight this). So when you load a TSS for a certain CPU you first make sure that CPU has loaded the GDT then you fill in the TSS descriptor in the GDT with information about that CPU's TSS you then call tss_load, the CPU at that point reads the data in the TSS descriptor located in the GDT and loads its TSS, after this the information stored in the GDT is useless and can be overwritten by the next CPU. So you don't need a unique TSS entry for each CPU, instead each CPU can use the same one as it's only needed when loading the actual TSS and not after.

Now it wouldent be the first time I'm wrong, however you are correct in saying that if all the CPUs actually were using the same TSS then that could cause a panic. Most likely it would cause a panic pretty much immediately and unavoidably, so the fact that the operating system (mostly) works should mean that the CPUs are using different TSSs.

I hope that helps. I'd gladly answer more questions if you have them.

@KaiNorberg KaiNorberg self-assigned this Nov 22, 2024
@cycl0ne
Copy link
Author

cycl0ne commented Nov 22, 2024

ah thanks ok.. i thought since all cpus share the GDT, i implemented it like this in my 2nd comment and added for each cpu a tss entry in the gdt.
i compiled and tryed your code. after about 1min i got a kernel panic. not doing anthing except pushing the start button (which was hard since it seems y is reversed. mouse down = cursor up).
This is the panic coming after ~20sec (pressed ctrl-c for the debug loop):
[ 29.837] cr0: 0x0000000080000033, cr2: 0x0000000000000009, cr3: 0x000000003FC90000, cr4: 0x00000000000006A0
[ 29.839] Call Stack:
[ 29.840] 0xFFFFFFFF8000986F
[ 29.840] 0xFFFFFFFF80011C9D
[ 29.841] !!! KERNEL PANIC - Exception !!!
[ 29.842] Occured on cpu 1 in process 0 thread 16
[ 29.843] pmm: free 253488, reserved 74608
[ 29.844] ss 0x0000000000000010, rsp 0xFFFFFFFF8086E500, rflags 0x0000000000000083, cs 0x0000000000000008, rip 0xFFFFFFFF800032BE
[ 29.845] error code 0x0000000000000000, vector 0x000000000000000E
[ 29.847] rax 0xFFFFFFFF80029040, rbx 0x0000000000000001, rcx 0x00000000C0000103, rdx 0x0000000000000001, rsi 0xFFFF8000C0294000, rdi 0x0000000000000000, rbp 0xFFFFFFFF8086E670
[ 29.849] r8 0xFFFF8000C0292200, r9 0xFFFF8000C0000000, r10 0xFFFFFFFF8086E3D0, r11 0x0000000000000008, r12 0x0000000000000002, r13 0xFFFFFFFF80002F50, r14 0xFFFFFFFF80011C9D, r15 0x0000000000000001

Thought maybe this could be of the stack being corrupt.

@KaiNorberg
Copy link
Owner

Hmm well that is a rather weird kernel panic, it seems to be a page fault happening in process 0, on thread 16, but process 0 should never create 16 threads it should have a total of 2 threads at most. It's definitely something that needs to get looked into.

The mouse behavior is also very weird, might have something to do with qemu, as while it's possible I find it unlikely that a code error could cause that. The only way I could see it is if some part of the PS2 standard is incorrectly implemented.

However I am not able to replicate either issue, most likely these issues only happen with a specific combination of hardware, virtual machine and compiler, which makes it very difficult to diagnose from my side.

@cycl0ne
Copy link
Author

cycl0ne commented Nov 22, 2024

i know this feeling ;-) and you dont want to see my combination where i test it ;-)
i run windows 11 with WSL (ubuntu) and in WSL the Qemu + vscode-server

shouldnt the panic stop btw?

@KaiNorberg
Copy link
Owner

Haha its a very familiar feeling. Oh boy yeah sounds like that might introduce alot of additional points of failure. Tho of course, a proper OS should run no matter how you run it.

What do you mean by stop? The panic will halt the entire OS and make the computer wait to be restarted as it can no longer safely continue.

@cycl0ne
Copy link
Author

cycl0ne commented Nov 22, 2024

nope your panic runs and runs and runs.. this is why i pressed CTRL-C to stop qemu.. debug was in screen and on console..
tried it again to get it, but now it worked.. no error.. still x and y are reversed.
i will try again tomorrow.. ;-)

@KaiNorberg
Copy link
Owner

Oh boy more weird issues. Anyway, that sounds like a good plan. I will keep this issue open, but rename it to something more representing these newly found issues.

@KaiNorberg KaiNorberg changed the title TSS maybe not correctly initalised Weird kernel panic Nov 22, 2024
@KaiNorberg
Copy link
Owner

The panic running on and on should now be fixed. Haven't taken a look at the other issues yet.

@cycl0ne
Copy link
Author

cycl0ne commented Nov 26, 2024

couldnt reproduce it.. trying it 2-3 times a day :)

but what i noticed:
/root/git/PatchworkOS/lib/OVMFbin/OVMF_VARS-pure-efi.fd

you should ignore this file in .git otherwise it wants to checkin everytime ;-)

@KaiNorberg
Copy link
Owner

That's one issue down. As it's bassicly impossible for me to diagnose and fix the other issues, as they are probably caused by a specific combination of hardware and software and there are almost certainly many more similar issues; I am making the decision to start accepting contributions. I will shortly edit the readme to reflect this. That means that (only if you want to of course :) ) you can try to fix the issues on your side and il try to help out if I can. I don't really have the time for it right now, unfortunately, but I might eventually write some proper documentation, to make contributions easier. I have also appended the gitignore.

@cycl0ne
Copy link
Author

cycl0ne commented Nov 27, 2024

Have to see if i have time. Im also on my own OS im coding since the last 20 Years and trying to get it to 64bit + 3 Kids & 3 Jobs, time is precious :-) But i will check, when i get my scheduler running, and dig into your code.

@KaiNorberg
Copy link
Owner

Yeah, of course, I don't quite have that much, mostly just my studies. But I totally understand the feeling, don't feel any pressure if you don't want to, either way good luck!

@cycl0ne
Copy link
Author

cycl0ne commented Nov 27, 2024

i cant say no. i promise i will look into it.. maybe in 2-3 weeks.

@KaiNorberg
Copy link
Owner

Gotcha, well in that case there is no hurry.

@cycl0ne
Copy link
Author

cycl0ne commented Nov 27, 2024

of course there is no hurry.. os development is for ages ... i use it to relax and relieve stress :-) Noone is interessted in OSs anymore..you have out there perfected android, macos, linux,... so noone cares about us ^^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants