| Time | 
    Nickname | 
    Message | 
    
        | 
            00:10
            
                🔗
            
         | 
         | 
        BlueMax has joined #archiveteam-ot | 
    
    
        | 
            01:12
            
                🔗
            
         | 
         | 
        kiska has joined #archiveteam-ot | 
    
    
        | 
            01:14
            
                🔗
            
         | 
         | 
        Polylith has quit IRC (Read error: Operation timed out) | 
    
    
        | 
            01:15
            
                🔗
            
         | 
         | 
        Polylith has joined #archiveteam-ot | 
    
    
        | 
            01:29
            
                🔗
            
         | 
         | 
        ColdIce has quit IRC (Read error: Operation timed out) | 
    
    
        | 
            01:35
            
                🔗
            
         | 
         | 
        ColdIce has joined #archiveteam-ot | 
    
    
        | 
            01:38
            
                🔗
            
         | 
         | 
        ColdIce has quit IRC (Read error: Connection reset by peer) | 
    
    
        | 
            01:39
            
                🔗
            
         | 
         | 
        w0rmhole has joined #archiveteam-ot | 
    
    
        | 
            02:12
            
                🔗
            
         | 
         | 
        adinbied has quit IRC (Quit: Left Channel.) | 
    
    
        | 
            02:25
            
                🔗
            
         | 
         | 
        adinbied has joined #archiveteam-ot | 
    
    
        | 
            02:30
            
                🔗
            
         | 
         | 
        adinbied has quit IRC (Quit: Left Channel.) | 
    
    
        | 
            02:44
            
                🔗
            
         | 
         | 
        adinbied has joined #archiveteam-ot | 
    
    
        | 
            03:42
            
                🔗
            
         | 
         | 
        Odd0002 has quit IRC (Quit: ZNC - http://znc.in) | 
    
    
        | 
            03:44
            
                🔗
            
         | 
         | 
        ivan has quit IRC (Read error: Operation timed out) | 
    
    
        | 
            03:45
            
                🔗
            
         | 
         | 
        JAA has quit IRC (Read error: Operation timed out) | 
    
    
        | 
            03:45
            
                🔗
            
         | 
         | 
        jspiros has quit IRC (Read error: Operation timed out) | 
    
    
        | 
            03:45
            
                🔗
            
         | 
         | 
        ivan has joined #archiveteam-ot | 
    
    
        | 
            03:46
            
                🔗
            
         | 
         | 
        svchfoo1 sets mode: +o ivan | 
    
    
        | 
            03:48
            
                🔗
            
         | 
         | 
        wp494 has quit IRC (Ping timeout: 492 seconds) | 
    
    
        | 
            03:51
            
                🔗
            
         | 
         | 
        wp494 has joined #archiveteam-ot | 
    
    
        | 
            04:02
            
                🔗
            
         | 
         | 
        Odd0002 has joined #archiveteam-ot | 
    
    
        | 
            04:08
            
                🔗
            
         | 
         | 
        Mateon1 has quit IRC (Ping timeout: 268 seconds) | 
    
    
        | 
            04:09
            
                🔗
            
         | 
         | 
        Mateon1 has joined #archiveteam-ot | 
    
    
        | 
            04:45
            
                🔗
            
         | 
         | 
        JAA has joined #archiveteam-ot | 
    
    
        | 
            04:45
            
                🔗
            
         | 
         | 
        svchfoo3 sets mode: +o JAA | 
    
    
        | 
            04:46
            
                🔗
            
         | 
         | 
        bakJAA sets mode: +o JAA | 
    
    
        | 
            04:50
            
                🔗
            
         | 
         | 
        jspiros has joined #archiveteam-ot | 
    
    
        | 
            04:57
            
                🔗
            
         | 
         | 
        dxrt- is now known as dxrt | 
    
    
        | 
            04:58
            
                🔗
            
         | 
         | 
        dxrt_ sets mode: +o dxrt | 
    
    
        | 
            06:15
            
                🔗
            
         | 
        w0rmhole | 
        ivan: you know the ins and outs of grab-site, right? | 
    
    
        | 
            06:15
            
                🔗
            
         | 
        Flashfire | 
        he wrote it | 
    
    
        | 
            06:15
            
                🔗
            
         | 
        Flashfire | 
        ...... | 
    
    
        | 
            06:16
            
                🔗
            
         | 
        w0rmhole | 
        oh ok, i was going to ask him a question about it | 
    
    
        | 
            06:25
            
                🔗
            
         | 
        ivan | 
        w0rmhole: I'm here | 
    
    
        | 
            06:28
            
                🔗
            
         | 
        w0rmhole | 
        ivan: okay, so im using grab-site and i adjusted the delay while a crawl was running from 0ms to 250ms by editing the delay file. | 
    
    
        | 
            06:28
            
                🔗
            
         | 
        w0rmhole | 
        doing that froze up grab-site. it's not moving at all. i dont really want to break it. | 
    
    
        | 
            06:28
            
                🔗
            
         | 
        w0rmhole | 
        even setting the delay back to 0 didn't make a difference | 
    
    
        | 
            06:28
            
                🔗
            
         | 
        w0rmhole | 
        for the record, this is the command i ran: $ grab-site https://www.exxoshost.co.uk/forum?archiveteam --igsets forums | 
    
    
        | 
            06:28
            
                🔗
            
         | 
        ivan | 
        w0rmhole: you can look at the terminal to see which URLs it's currently grabbing, or using gs-dump-urls with in_progress | 
    
    
        | 
            06:29
            
                🔗
            
         | 
        ivan | 
        changing a delay to 250ms doesn't freeze crawls, probably a coincidence | 
    
    
        | 
            06:29
            
                🔗
            
         | 
        w0rmhole | 
        oh of course, right when you typed that it started working | 
    
    
        | 
            06:30
            
                🔗
            
         | 
        w0rmhole | 
        i think so | 
    
    
        | 
            06:31
            
                🔗
            
         | 
        w0rmhole | 
        said something about dns resolution errors when it continued , but i think that might just be an issue with the site and not grab-site | 
    
    
        | 
            06:33
            
                🔗
            
         | 
        w0rmhole | 
        one other question i have if you don't mind | 
    
    
        | 
            06:34
            
                🔗
            
         | 
        ivan | 
        I'm still here | 
    
    
        | 
            06:34
            
                🔗
            
         | 
        w0rmhole | 
        that forum keeps putting in that stupid phpsessid garbage in the url | 
    
    
        | 
            06:34
            
                🔗
            
         | 
        w0rmhole | 
        is there a way for grab-site to not capture those urls, and only the actual url? | 
    
    
        | 
            06:35
            
                🔗
            
         | 
        ivan | 
        wpull has a URLRewriter that should be handling that | 
    
    
        | 
            06:35
            
                🔗
            
         | 
        w0rmhole | 
        i.e. https://www.exxoshost.co.uk/forum/viewtopic.php?f=14&t=1196 as opposed to https://www.exxoshost.co.uk/forum/viewtopic.php?f=14&t=1196?sid=0befb2c2dc4ac8d45b88f1fe7cce2b71 | 
    
    
        | 
            06:35
            
                🔗
            
         | 
        w0rmhole | 
        is that enabled by default? | 
    
    
        | 
            06:35
            
                🔗
            
         | 
        w0rmhole | 
        in grab-site | 
    
    
        | 
            06:35
            
                🔗
            
         | 
        ivan | 
            re.compile("^(.*)(?:sid=[0-9a-zA-Z]{32})(?:&(.*))?$", re.I), | 
    
    
        | 
            06:36
            
                🔗
            
         | 
        * | 
        ivan looks | 
    
    
        | 
            06:36
            
                🔗
            
         | 
        w0rmhole | 
        ... | 
    
    
        | 
            06:36
            
                🔗
            
         | 
        w0rmhole | 
        i dont know what to do with that x_x | 
    
    
        | 
            06:37
            
                🔗
            
         | 
        ivan | 
        yes | 
    
    
        | 
            06:37
            
                🔗
            
         | 
        ivan | 
        libgrabsite/main.py | 
    
    
        | 
            06:37
            
                🔗
            
         | 
        ivan | 
        253:            "--strip-session-id", | 
    
    
        | 
            06:37
            
                🔗
            
         | 
        w0rmhole | 
        ohh nvm i know what you mean | 
    
    
        | 
            06:38
            
                🔗
            
         | 
        ivan | 
        so it's enabled but I don't know the details of the implementation | 
    
    
        | 
            06:38
            
                🔗
            
         | 
        ivan | 
        is grab-site grabbing URLs with the session id? | 
    
    
        | 
            06:38
            
                🔗
            
         | 
        w0rmhole | 
        in one situation yes | 
    
    
        | 
            06:38
            
                🔗
            
         | 
        w0rmhole | 
        i'll try to find the original command i used | 
    
    
        | 
            06:39
            
                🔗
            
         | 
        w0rmhole | 
        btw, do i need to use that ?archiveteam thing in the url like i did up there? | 
    
    
        | 
            06:40
            
                🔗
            
         | 
        w0rmhole | 
        to keep the session id out | 
    
    
        | 
            06:40
            
                🔗
            
         | 
        ivan | 
        probably not | 
    
    
        | 
            06:40
            
                🔗
            
         | 
        w0rmhole | 
        $ grab-site --1 --ua="Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.13; ) Gecko/20101203" "http://ataristeven.exxoshost.co.uk/" "https://www.exxoshost.co.uk/forum/viewtopic.php?f=13&t=513" "https://www.exxoshost.co.uk/forum/viewtopic.php?f=13&t=513&start=10" "https://www.exxoshost.co.uk/forum/viewtopic.php?f=13&t=513&start=20" | 
    
    
        | 
            06:40
            
                🔗
            
         | 
        w0rmhole | 
        ok so that's the command i ran | 
    
    
        | 
            06:41
            
                🔗
            
         | 
        w0rmhole | 
        and while browsing the warc with webrecorder player | 
    
    
        | 
            06:41
            
                🔗
            
         | 
        w0rmhole | 
        in the url field, i saw the session id appear on the 2nd forum page | 
    
    
        | 
            06:41
            
                🔗
            
         | 
        w0rmhole | 
        aka start=10 | 
    
    
        | 
            06:46
            
                🔗
            
         | 
        ivan | 
        please file a bug with details because my crawl of the site hangs on something pretty quickly | 
    
    
        | 
            06:46
            
                🔗
            
         | 
        ivan | 
        maybe it's some weird page requisite behavior, I don't know | 
    
    
        | 
            06:47
            
                🔗
            
         | 
        w0rmhole | 
        okay, will do | 
    
    
        | 
            06:48
            
                🔗
            
         | 
        w0rmhole | 
        i had to manually specify the user agent to that to allow it to grab those | 
    
    
        | 
            06:48
            
                🔗
            
         | 
        w0rmhole | 
        with ua specified=~30s | 
    
    
        | 
            06:48
            
                🔗
            
         | 
        w0rmhole | 
        w/o ua specified=~4min | 
    
    
        | 
            06:48
            
                🔗
            
         | 
        w0rmhole | 
        iirc | 
    
    
        | 
            06:59
            
                🔗
            
         | 
        w0rmhole | 
        ivan: https://github.com/ludios/grab-site/issues/132 | 
    
    
        | 
            06:59
            
                🔗
            
         | 
        w0rmhole | 
        hope my english skills aren't too shitty | 
    
    
        | 
            07:06
            
                🔗
            
         | 
        ivan | 
        replied there | 
    
    
        | 
            07:08
            
                🔗
            
         | 
        ivan | 
        we don't fabricate responses in WARCs, that would be bad | 
    
    
        | 
            07:11
            
                🔗
            
         | 
        w0rmhole | 
        sorry | 
    
    
        | 
            07:12
            
                🔗
            
         | 
        w0rmhole | 
        im still new to grab-site so i am still adjusting to how it works | 
    
    
        | 
            07:12
            
                🔗
            
         | 
        ivan | 
        w0rmhole: you can start the crawl with a sid, see the README for the cookie stuff | 
    
    
        | 
            07:12
            
                🔗
            
         | 
        ivan | 
        does it work? don't set your hopes too high | 
    
    
        | 
            07:12
            
                🔗
            
         | 
        w0rmhole | 
        i did add a cookie file later on | 
    
    
        | 
            07:13
            
                🔗
            
         | 
        w0rmhole | 
        which didnt make much a difference | 
    
    
        | 
            07:14
            
                🔗
            
         | 
        ivan | 
        try setting the cookie expiration time to the distant future | 
    
    
        | 
            07:14
            
                🔗
            
         | 
        ivan | 
        2147483647 | 
    
    
        | 
            07:14
            
                🔗
            
         | 
        ivan | 
        paste me your cookie file with the sid | 
    
    
        | 
            07:14
            
                🔗
            
         | 
        w0rmhole | 
        1 minute pls | 
    
    
        | 
            07:15
            
                🔗
            
         | 
        w0rmhole | 
        .blogspot.com	TRUE	/	FALSE	2147483647	NCR	1 | 
    
    
        | 
            07:15
            
                🔗
            
         | 
        w0rmhole | 
        .exxoshost.co.uk	TRUE	/	TRUE	1568875520	phpbbexxos_k	 | 
    
    
        | 
            07:15
            
                🔗
            
         | 
        w0rmhole | 
        .exxoshost.co.uk	TRUE	/	TRUE	1568875520	phpbbexxos_sid	b2fbc6f704098f6e4a6711a8eb508b98 | 
    
    
        | 
            07:15
            
                🔗
            
         | 
        w0rmhole | 
        .exxoshost.co.uk	TRUE	/	TRUE	1568875520	phpbbexxos_u	1 | 
    
    
        | 
            07:15
            
                🔗
            
         | 
        w0rmhole | 
        .reddit.com	TRUE	/	FALSE	2147483647	over18	1 | 
    
    
        | 
            07:15
            
                🔗
            
         | 
        w0rmhole | 
        store.steampowered.com	FALSE	/	FALSE	2147483647	birthtime	0 | 
    
    
        | 
            07:15
            
                🔗
            
         | 
        w0rmhole | 
        store.steampowered.com	FALSE	/	FALSE	2147483647	lastagecheckage	1-January-1970 | 
    
    
        | 
            07:15
            
                🔗
            
         | 
        w0rmhole | 
        store.steampowered.com	FALSE	/	FALSE	2147483647	mature_content	1 | 
    
    
        | 
            07:15
            
                🔗
            
         | 
        w0rmhole | 
        oh sorry bad formatting | 
    
    
        | 
            07:15
            
                🔗
            
         | 
        w0rmhole | 
        i will use link | 
    
    
        | 
            07:16
            
                🔗
            
         | 
        ivan | 
        I think your tabs got lost yeah | 
    
    
        | 
            07:16
            
                🔗
            
         | 
        w0rmhole | 
        http://pasted.co/6b3de39e | 
    
    
        | 
            07:18
            
                🔗
            
         | 
        ivan | 
        yeah try changing the 1568875520 expiration to 2147483647 | 
    
    
        | 
            07:18
            
                🔗
            
         | 
        w0rmhole | 
        ok let me try | 
    
    
        | 
            07:18
            
                🔗
            
         | 
        ivan | 
        and maybe make sure the session is fresh enough for the server to still know about it? | 
    
    
        | 
            07:18
            
                🔗
            
         | 
        ivan | 
        if that doesn't work there might not be much you can do about the forum software giving you sid links | 
    
    
        | 
            07:18
            
                🔗
            
         | 
        w0rmhole | 
        sorry that last part confuses me | 
    
    
        | 
            07:19
            
                🔗
            
         | 
        ivan | 
        there's also a `secure` flag after the path set to TRUE but I assume you're grabbing https:// forum pages | 
    
    
        | 
            07:19
            
                🔗
            
         | 
        w0rmhole | 
        (english is my second language btw) | 
    
    
        | 
            07:19
            
                🔗
            
         | 
        w0rmhole | 
        yes | 
    
    
        | 
            07:19
            
                🔗
            
         | 
        ivan | 
        if the forum forgot about the session it might give you a ?sid= link with a new session, but I'm just guessing how it works | 
    
    
        | 
            07:19
            
                🔗
            
         | 
        w0rmhole | 
        so possible solution would be to get new sessid? | 
    
    
        | 
            07:20
            
                🔗
            
         | 
        ivan | 
        they probably expire in a reasonably short period | 
    
    
        | 
            07:20
            
                🔗
            
         | 
        ivan | 
        yes | 
    
    
        | 
            07:20
            
                🔗
            
         | 
        w0rmhole | 
        ok | 
    
    
        | 
            07:20
            
                🔗
            
         | 
        w0rmhole | 
        i should still specify user agent, correct? | 
    
    
        | 
            07:20
            
                🔗
            
         | 
        ivan | 
        I guess | 
    
    
        | 
            07:21
            
                🔗
            
         | 
        w0rmhole | 
        if i dont some images do not load | 
    
    
        | 
            07:21
            
                🔗
            
         | 
        ivan | 
        oh heh never mind 1568875520 is this date next year | 
    
    
        | 
            07:22
            
                🔗
            
         | 
        ivan | 
        Forum Software, man | 
    
    
        | 
            07:23
            
                🔗
            
         | 
        ivan | 
        does the WARC player fail to find the page when you click on a ?sid= link? | 
    
    
        | 
            07:25
            
                🔗
            
         | 
        ivan | 
        and which one are you using? | 
    
    
        | 
            07:28
            
                🔗
            
         | 
        w0rmhole | 
        no it finds it | 
    
    
        | 
            07:28
            
                🔗
            
         | 
        w0rmhole | 
        using the same player mentioned on github | 
    
    
        | 
            08:08
            
                🔗
            
         | 
        ivan | 
        ok, that sounds like a decent outcome despite the sid= crap | 
    
    
        | 
            08:22
            
                🔗
            
         | 
        w0rmhole | 
        ivan: one other thing, does grab-site support delays like: 250ms-350ms instead of just one number? | 
    
    
        | 
            08:24
            
                🔗
            
         | 
        ivan | 
        w0rmhole: yeah, just write 250-350 to the file | 
    
    
        | 
            08:24
            
                🔗
            
         | 
        ivan | 
        or give that to --delay= | 
    
    
        | 
            08:25
            
                🔗
            
         | 
        w0rmhole | 
        thanks! :) | 
    
    
        | 
            08:25
            
                🔗
            
         | 
        w0rmhole | 
        i really like grab-site, good work! | 
    
    
        | 
            08:27
            
                🔗
            
         | 
        ivan | 
        it's mostly chfoo's work in wpull but thanks | 
    
    
        | 
            08:34
            
                🔗
            
         | 
        w0rmhole | 
        both of you, i give my thanks to | 
    
    
        | 
            08:34
            
                🔗
            
         | 
         | 
        C4K3 has quit IRC (leaving) | 
    
    
        | 
            09:13
            
                🔗
            
         | 
         | 
        faolingfa has quit IRC (Leaving) | 
    
    
        | 
            10:22
            
                🔗
            
         | 
         | 
        BlueMax has quit IRC (Quit: Leaving) | 
    
    
        | 
            10:42
            
                🔗
            
         | 
         | 
        kiska has quit IRC (Read error: Connection reset by peer) | 
    
    
        | 
            10:44
            
                🔗
            
         | 
         | 
        w0rmhole has quit IRC (Ping timeout: 252 seconds) | 
    
    
        | 
            10:44
            
                🔗
            
         | 
         | 
        Flashfire has quit IRC (Ping timeout: 252 seconds) | 
    
    
        | 
            10:52
            
                🔗
            
         | 
         | 
        kiska has joined #archiveteam-ot | 
    
    
        | 
            10:52
            
                🔗
            
         | 
         | 
        kiska has quit IRC (se.hub irc.underworld.no) | 
    
    
        | 
            12:37
            
                🔗
            
         | 
        JAA | 
        Underworld pls | 
    
    
        | 
            13:10
            
                🔗
            
         | 
         | 
        sknebel has quit IRC (Ping timeout: 268 seconds) | 
    
    
        | 
            13:11
            
                🔗
            
         | 
         | 
        kiska has joined #archiveteam-ot | 
    
    
        | 
            13:33
            
                🔗
            
         | 
         | 
        faolingfa has joined #archiveteam-ot | 
    
    
        | 
            13:34
            
                🔗
            
         | 
         | 
        wp494 has quit IRC (Read error: Operation timed out) | 
    
    
        | 
            13:35
            
                🔗
            
         | 
         | 
        sknebel has joined #archiveteam-ot | 
    
    
        | 
            13:35
            
                🔗
            
         | 
         | 
        wp494 has joined #archiveteam-ot | 
    
    
        | 
            15:12
            
                🔗
            
         | 
         | 
        w0rmhole has joined #archiveteam-ot | 
    
    
        | 
            15:19
            
                🔗
            
         | 
        w0rmhole | 
        ivan: is there a way to set the number of retries if grab-site fails to get something the first few times? | 
    
    
        | 
            16:50
            
                🔗
            
         | 
         | 
        schbirid has joined #archiveteam-ot | 
    
    
        | 
            17:30
            
                🔗
            
         | 
        ivan | 
        w0rmhole: --wpull-args=--tries=N | 
    
    
        | 
            18:47
            
                🔗
            
         | 
         | 
        schbirid has quit IRC (Read error: Operation timed out) | 
    
    
        | 
            18:48
            
                🔗
            
         | 
         | 
        schbirid has joined #archiveteam-ot | 
    
    
        | 
            18:59
            
                🔗
            
         | 
         | 
        schbirid has quit IRC (Read error: Operation timed out) | 
    
    
        | 
            21:49
            
                🔗
            
         | 
         | 
        Flashfire has joined #archiveteam-ot | 
    
    
        | 
            23:08
            
                🔗
            
         | 
         | 
        BlueMax has joined #archiveteam-ot | 
    
    
        | 
            23:10
            
                🔗
            
         | 
         | 
        Jens has quit IRC (Remote host closed the connection) | 
    
    
        | 
            23:11
            
                🔗
            
         | 
         | 
        Jens has joined #archiveteam-ot |