[00:02] *** dashcloud has joined #archiveteam [00:11] *** RichardG has quit IRC (Read error: Connection reset by peer) [00:12] *** RichardG has joined #archiveteam [00:43] *** dashcloud has quit IRC (Read error: Operation timed out) [00:48] *** dashcloud has joined #archiveteam [01:07] *** WubTheCap has joined #archiveteam [01:09] *** muramasa has joined #archiveteam [02:15] *** bitBaron has quit IRC (Quit: Bye!) [02:19] *** Odd0002 has quit IRC (Ping timeout: 260 seconds) [02:25] *** Odd0002 has joined #archiveteam [02:46] *** yuitimoth has quit IRC (Ping timeout: 250 seconds) [03:40] *** nightpool has joined #archiveteam [03:58] *** muramasa has quit IRC (Read error: Connection reset by peer) [03:58] *** muramasa has joined #archiveteam [04:02] *** espes__ has quit IRC (Ping timeout: 252 seconds) [04:02] *** espes__ has joined #archiveteam [04:06] *** djbeadle has joined #archiveteam [04:20] *** muramasa has quit IRC (Read error: Connection reset by peer) [04:38] *** qw3rty111 has joined #archiveteam [04:44] *** qw3rty119 has quit IRC (Read error: Operation timed out) [05:03] *** octothorp has quit IRC (Remote host closed the connection) [05:04] *** octothorp has joined #archiveteam [05:11] *** muramasa has joined #archiveteam [05:31] *** ivan has quit IRC (Leaving) [05:33] *** ivan has joined #archiveteam [05:37] *** djbeadle has quit IRC (My MacBook has gone to sleep.) [05:41] *** Jens has quit IRC (Remote host closed the connection) [05:42] *** Jens has joined #archiveteam [05:46] *** rsznick has joined #archiveteam [05:47] *** rsznik has quit IRC (Read error: Operation timed out) [05:47] *** fie has joined #archiveteam [06:18] *** RichardG has quit IRC (Ping timeout: 250 seconds) [06:33] *** bithippo has joined #archiveteam [07:01] *** zyphlar_ has joined #archiveteam [07:06] *** bithippo has quit IRC (Quit: Page closed) [07:37] *** bwn has quit IRC (Read error: Operation timed out) [08:04] https://github.com/N0taN3rd/Squidwarc [08:04] Anyone want to check out, report back [08:06] *** bwn has joined #archiveteam [08:15] Uh, there’s chromebot in #archivebot. [08:19] Yes [08:19] And does this have different features, etc [08:25] chromebot currently lacks support for recursive crawls. squidwarc has no IRC bot (config file based) and no custom behavior scripts (i.e. JavaScript injected into the page to perform some action). [08:28] Looks like they are using the same interface as chromebot to capture network requests (no proxy). [08:32] *** Mateon1 has quit IRC (Ping timeout: 250 seconds) [08:32] *** Mateon1 has joined #archiveteam [08:32] *** Mateon1 has quit IRC (Connection closed) [08:48] *** Ctrl has joined #archiveteam [09:01] *** atomotic has joined #archiveteam [09:09] *** schbirid has joined #archiveteam [09:37] ... which I still think is not the right way to tackle it. [10:17] *** z00nx has quit IRC (Ping timeout: 252 seconds) [10:17] *** mr_archiv has quit IRC (Ping timeout: 252 seconds) [10:23] *** mr_archiv has joined #archiveteam [10:23] *** z00nx has joined #archiveteam [10:58] *** atomotic has quit IRC (Quit: atomotic) [10:59] *** zyphlar_ has quit IRC (Quit: Connection closed for inactivity) [11:40] *** fie has quit IRC (Quit: Leaving) [12:05] *** Ctrl has quit IRC (Read error: Operation timed out) [12:39] *** atomotic has joined #archiveteam [12:51] *** djbeadle has joined #archiveteam [13:21] *** djbeadle has quit IRC (My MacBook has gone to sleep.) [13:51] *** yuitimoth has joined #archiveteam [14:12] *** RichardG has joined #archiveteam [14:18] JAA: the thing is with "shitty" javascript webapps, rendered HTML might be best to scrape [14:23] jrwr: In my opinion, we should capture both the raw network traffic of everything retrieved by the browser as well as the resulting DOM. I'm not entirely sure how the latter should be stored though. It doesn't fit into the WARC model of requests and responses; maybe a new record type for such "derived" resources would be needed. [14:26] brozzler handles the network traffic side (and wpull + PhantomJS as well, I think). chromebot and Squidwarc are somewhere inbetween as far as I understand it (not the raw traffic because Chromium's API doesn't expose that -- transfer encoding is stripped, for example, and headers are in a standardised format rather than the raw bytes sent by the server -- but it's also not the final DOM). [14:27] archive.is would be an example for storing only the resulting DOM (I believe). [14:34] *** atomotic has quit IRC (Quit: atomotic) [14:39] *** Feld0 has joined #archiveteam [14:41] *** GLaDOS has joined #archiveteam [14:44] *** djbeadle has joined #archiveteam [15:20] *** Ctrl has joined #archiveteam [15:33] *** Ctrl has quit IRC (Read error: Operation timed out) [15:44] *** GLaDOS has quit IRC (Quit: Leaving) [15:50] *** atomotic has joined #archiveteam [16:07] *** atomotic has quit IRC (Quit: atomotic) [16:22] *** Mateon1 has joined #archiveteam [16:22] *** Mateon1 has quit IRC (Connection closed) [16:52] *** Ctrl has joined #archiveteam [17:19] https://www.twitch.tv/textfilesdotcom [17:19] GET SOME LEARNING, FELLOWS [18:08] *** atomicthu has quit IRC (Read error: Operation timed out) [18:10] *** c4rc4s has quit IRC (Ping timeout: 600 seconds) [18:48] *** c4rc4s has joined #archiveteam [19:10] *** atomicthu has joined #archiveteam [19:52] *** jschwart has joined #archiveteam [20:17] *** n00b559 has joined #archiveteam [20:19] Hello, I lost a video in yahoo videos back in 2011 or so, can I find out how to find it on here? [20:23] *** RichardG has quit IRC (Read error: Connection reset by peer) [20:24] *** RichardG has joined #archiveteam [20:26] *** n00b559 has quit IRC (Ping timeout: 260 seconds) [20:59] *** bsmith093 has quit IRC (Quit: Leaving.) [21:02] *** bsmith093 has joined #archiveteam [21:29] *** ats has quit IRC (Quit: it's new kernel time!) [21:31] *** ats has joined #archiveteam [21:45] *** sekolyn has joined #archiveteam [21:45] *** octothorp has quit IRC (Read error: Connection reset by peer) [21:52] *** sekolyn has quit IRC (Read error: Connection reset by peer) [21:52] *** octothorp has joined #archiveteam [21:59] *** Asparagir has joined #archiveteam [22:27] *** schbirid has quit IRC (Quit: Leaving) [22:30] *** dashcloud has quit IRC (Ping timeout: 260 seconds) [22:34] *** dashcloud has joined #archiveteam [22:37] *** dashcloud has quit IRC (Read error: Operation timed out) [22:37] *** dashcloud has joined #archiveteam [22:51] *** ola_norsk has joined #archiveteam [22:52] *** macker has quit IRC (Quit: out of beer...) [22:52] is there anyone at Internet Archive that is responsible for IA social media precense? I'd like to hand over https://www.minds.com/InternetArchive [22:53] (i don't have faceboor or twitter :/ ) [22:53] facebook* [22:54] *** macker- has joined #archiveteam [22:54] e.g some @archive.org mail i could send the u/p to? [22:54] *** macker- is now known as macker [23:01] *** jschwart has quit IRC (Quit: Konversation terminated!) [23:07] *** kitties has joined #archiveteam [23:10] *** RichardG has quit IRC (Read error: Connection reset by peer) [23:11] *** RichardG has joined #archiveteam [23:15] * ola_norsk is just going to guess Alexis Rossi and CC the rest.. [23:22] https://www.emailsherlock.com/emailsearch/jscott@archive.org/ [23:24] I'd say just email info@archive.org, asking them if they're interested etc. You'll either get a reply from the person you're looking for or instructions how to contact them. [23:28] holy crap, how that get pasted into the window?? [23:28] JAA: ok [23:29] does irssi do drag-n-drop? :/ [23:30] JAA: i'm just going to CC though..these day's one email address is usually not enough. [23:32] *** ola_norsk has quit IRC (Wanting people to listen, you can't just tap them on the shoulder anymore. You have to hit them with a sledgehammer) [23:40] *** BobJonkma has joined #archiveteam [23:49] *** djbeadle has quit IRC (My MacBook has gone to sleep.) [23:55] *** Stiletto has quit IRC (Ping timeout: 246 seconds) [23:59] *** Stilett0 has joined #archiveteam