[00:09] *** SmileyG has joined #archiveteam-bs [00:09] *** Smiley has quit IRC (Read error: Connection reset by peer) [00:10] *** DragonMon has quit IRC (Quit: Leaving) [00:10] *** JH88 has quit IRC (Read error: Connection reset by peer) [00:10] *** BlueMaxim has joined #archiveteam-bs [00:10] *** JH88 has joined #archiveteam-bs [00:11] *** JH88 has quit IRC (Read error: Connection reset by peer) [00:12] *** JH88 has joined #archiveteam-bs [00:12] *** vectr0n_ has joined #archiveteam-bs [00:12] *** m007a83__ has joined #archiveteam-bs [00:13] *** c4rc4s has quit IRC (Ping timeout: 600 seconds) [00:14] *** c4rc4s has joined #archiveteam-bs [00:14] *** BlueMax has quit IRC (Read error: Operation timed out) [00:16] *** m007a83 has quit IRC (Read error: Operation timed out) [00:20] *** wp494 has quit IRC (Remote host closed the connection) [00:20] *** vectr0n has quit IRC (Ping timeout: 600 seconds) [00:20] *** vectr0n_ is now known as vectr0n [00:20] *** wp494 has joined #archiveteam-bs [00:21] *** kiska has quit IRC (Read error: Connection reset by peer) [00:21] *** squires has quit IRC (Read error: Connection reset by peer) [00:22] *** tyzoid has quit IRC (Ping timeout: 600 seconds) [00:24] *** tyzoid has joined #archiveteam-bs [00:24] *** godane has quit IRC (Read error: Operation timed out) [00:25] *** godane has joined #archiveteam-bs [00:26] *** ivan has quit IRC (Read error: Connection reset by peer) [00:26] *** svchfoo1 sets mode: +o godane [00:26] *** JAA has quit IRC (Read error: Operation timed out) [00:27] *** jspiros has quit IRC (Read error: Operation timed out) [00:27] *** zyphlar has quit IRC (Read error: Operation timed out) [00:27] *** wabu has quit IRC (Read error: Operation timed out) [00:27] *** Petri152 has quit IRC (Read error: Operation timed out) [00:28] *** chfoo has quit IRC (Read error: Operation timed out) [00:28] *** beardicus has quit IRC (Read error: Operation timed out) [00:28] *** ivan has joined #archiveteam-bs [00:29] *** kiska has joined #archiveteam-bs [00:29] *** beardicus has joined #archiveteam-bs [00:29] *** odemg has quit IRC (Read error: Operation timed out) [00:29] *** odemg has joined #archiveteam-bs [00:32] *** wp494 has quit IRC (Read error: Operation timed out) [00:33] *** adinbied has quit IRC (Ping timeout: 255 seconds) [00:33] *** mundus20- has quit IRC (Ping timeout: 255 seconds) [00:33] *** squires has joined #archiveteam-bs [00:33] *** wp494 has joined #archiveteam-bs [00:34] *** achip has quit IRC (west.us.hub irc.Prison.NET) [00:34] *** robogoat has quit IRC (west.us.hub irc.Prison.NET) [00:34] *** mundus201 has joined #archiveteam-bs [00:35] *** tyzoid has quit IRC (Read error: Operation timed out) [00:35] *** tyzoid has joined #archiveteam-bs [00:36] *** chfoo has joined #archiveteam-bs [00:36] *** adinbied has joined #archiveteam-bs [00:38] *** beardicus has quit IRC (Read error: Operation timed out) [00:39] *** beardicus has joined #archiveteam-bs [00:41] *** ivan has quit IRC (Read error: Operation timed out) [00:43] *** balrog has quit IRC (Read error: Operation timed out) [00:44] *** ivan has joined #archiveteam-bs [00:46] *** balrog has joined #archiveteam-bs [00:46] *** swebb sets mode: +o balrog [00:51] *** Dimtree has quit IRC (Read error: Operation timed out) [00:53] *** beardicus has quit IRC (Read error: Operation timed out) [00:55] *** achip has joined #archiveteam-bs [00:55] *** robogoat has joined #archiveteam-bs [00:56] *** beardicus has joined #archiveteam-bs [00:56] *** eLbot has quit IRC (Ping timeout: 600 seconds) [00:57] *** eLbot has joined #archiveteam-bs [00:59] *** squires has quit IRC (Ping timeout: 600 seconds) [01:01] *** beardicus has quit IRC (Read error: Operation timed out) [01:02] *** adinbied has quit IRC (Read error: Operation timed out) [01:04] *** beardicus has joined #archiveteam-bs [01:04] *** adinbied has joined #archiveteam-bs [01:05] *** PotcFdk has quit IRC (Ping timeout: 600 seconds) [01:08] *** Dimtree has joined #archiveteam-bs [01:09] *** squires has joined #archiveteam-bs [01:10] *** REiN^ has quit IRC (Ping timeout: 600 seconds) [01:10] *** ivan has quit IRC (Ping timeout: 600 seconds) [01:12] about pixiv on the other channel [01:12] url is https://www.pixiv.net/member_illust.php?mode=medium&illust_id=ID_POST [01:13] if space is problem nearly a third are png and could be losslessly more compressed [01:13] *** tyzoid has quit IRC (Read error: Operation timed out) [01:14] *** beardicus has quit IRC (Read error: Operation timed out) [01:14] *** tyzoid has joined #archiveteam-bs [01:15] From #archiveteam [01:16] would save about 20% if recompress is 50% effective, this is full resulutio + json of the post from the api [01:16] [2018-08-11 01:08:50] I extrapolate that every year 1664k posts on pixiv are deleted is this OK? that is 2% of its website every year (it grew 8.9% last year) [01:16] [2018-08-11 01:08:50] acording to my calculations there are 80 to 100TB, it should be a little smaller because I made generous assumptions, it considers only 20% are deleted and ignores that 10 years ago uploads were smaller in size [01:16] [2018-08-11 01:08:50] is this too big? [01:16] [2018-08-11 01:11:49] Too big for archivebot depending on how the urls are formatted a warrior project could be possible [01:16] [2018-08-11 01:11:49] 100T is a fair chunk, we'd need signoff from archive.org that they want it [01:17] I can share a spreedsheet with the math on the size if anyone cares [01:18] When grabbing warc's of a site, we tend to grab full sized images, and not recompress since we don't modify warcs in anyway [01:19] yeah but you need to click to get the full resolution, you get a 1200 maximum height if not [01:19] I dont know that much about warc sorry [01:20] *** ivan has joined #archiveteam-bs [01:23] there is python tool that uses the api and can download by id, you just need to go ascending by ID from 0 to 70 000k but from what I read you guys mostly do full pages [01:23] ID ranges could be split [01:24] Have a read of https://archiveteam.org/index.php?title=The_WARC_Ecosystem [01:24] Since it looks like kiska is having connection issues I'll be here [01:26] thanks I will read as I am interested besides using it for this [01:26] *** squires has quit IRC (Ping timeout: 600 seconds) [01:26] *** REiN^ has joined #archiveteam-bs [01:26] *** kiska has quit IRC (Remote host closed the connection) [01:28] *** wabu has joined #archiveteam-bs [01:28] *** zyphlar has joined #archiveteam-bs [01:28] *** Petri152 has joined #archiveteam-bs [01:28] *** JAA has joined #archiveteam-bs [01:28] *** swebb sets mode: +o JAA [01:28] *** bakJAA sets mode: +o JAA [01:29] We would need confirmation that IA wants this but it would be a good warrior project [01:30] *** ivan has quit IRC (Ping timeout: 600 seconds) [01:31] *** ivan has joined #archiveteam-bs [01:31] *** jspiros has joined #archiveteam-bs [01:33] *** beardicus has joined #archiveteam-bs [01:34] *** squires has joined #archiveteam-bs [01:34] *** kiska has joined #archiveteam-bs [01:39] I have a question, how does WARC well does it handle dynamic loading content? [01:39] comments load on click [01:39] *** PotcFdk has joined #archiveteam-bs [01:43] I'll redirect this to JAA [01:43] thanks you [01:45] I will leave this here that is what I think is the best tool for everything except the comments https://github.com/Nandaka/PixivUtil2 [01:49] here is the extrapolation of the size based on my sample and on danbooru's sample https://docs.google.com/spreadsheets/d/13zPMdRlBL_0IxK5nXteMmWqkkAIJxo8823iUaAAmw_k/ [01:55] I could write a comment scraper or dumper using the api, besides that the json gives whatelse there is needed to be saved, full post image, name of the image, tool needed, description, tags, number of likes,views and favs [02:01] *** m007a83__ is now known as m007a83 [03:02] latest scan : https://archive.org/details/star-trek-communicator-issue-133 [03:03] latest scan : https://archive.org/details/star-trek-communicator-issue-137 [03:04] latest scan : https://archive.org/details/star-trek-communicator-issue-133-postcards [03:27] We already did something for pixiv before https://archiveteam.org/index.php?title=Pixiv_Chat [03:27] channel for pixiv: #savepixiv [03:55] *** archodg_ has joined #archiveteam-bs [03:57] *** archodg__ has quit IRC (Ping timeout: 252 seconds) [03:58] *** odemg has quit IRC (Ping timeout: 260 seconds) [04:10] *** odemg has joined #archiveteam-bs [05:14] that is the chat thing thankfully saved, the pics are too many to start saving when needed [05:19] *** archodg__ has joined #archiveteam-bs [05:22] *** archodg_ has quit IRC (Ping timeout: 252 seconds) [05:38] SketchCow: so i got a odd tape with 'spy cam' footage and a brian wilson documentary on disney channel [05:39] the spy cam footage is from november 3, 2002 [05:39] the disney channel stuff is from 1990 [05:39] its very odd tape [05:42] it was basically a guy washing his car [05:44] i think this was a copy of something importiant [05:44] anyways i'm uploading it to FOS [05:47] *** c4rc4s has quit IRC (Ping timeout: 360 seconds) [06:03] *** c4rc4s has joined #archiveteam-bs [06:37] *** superkuh has quit IRC (Quit: the neuronal action potential is an electrical manipulation of reversible abrupt phase changes in the lipid bilaye) [06:47] *** superkuh has joined #archiveteam-bs [07:03] *** aleph has quit IRC (Quit: WeeChat 1.6) [07:22] Hm, my IA account @purplesymphony was locked. Can anyone check why? [07:22] (Should be available here: https://archive.org/history/@purplesymphony) [08:09] *** BartoCH has quit IRC (Quit: WeeChat 2.2) [08:24] *** BartoCH has joined #archiveteam-bs [08:53] *** signius has joined #archiveteam-bs [08:56] *** signius has quit IRC (Client Quit) [08:56] *** signius has joined #archiveteam-bs [09:30] *** username1 has joined #archiveteam-bs [09:33] *** schbirid2 has quit IRC (Read error: Operation timed out) [09:33] *** BartoCH has quit IRC (Quit: WeeChat 2.2) [09:48] *** BartoCH has joined #archiveteam-bs [09:54] *** wp494 has quit IRC (Read error: Operation timed out) [10:28] those alex jones videos I mentioned earlier [10:29] I have 160GB of them. anyone want to take them? [10:36] *** wp494 has joined #archiveteam-bs [11:31] *** SketchCow has quit IRC (Ping timeout: 260 seconds) [12:01] Not with a 10 foot pole [13:24] *** BlueMaxim has quit IRC (Quit: Leaving) [13:56] *** bitBaron has joined #archiveteam-bs [14:15] fredgido: WARC itself can handle anything. It's mostly the retrieval and playback that messes with this. Retrieval since you typically need an interactive browser (or at least a way to simulate the same behaviour). Playback is difficult if URLs include non-deterministic values (e.g. timestamps, platform-dependent parameters, random strings) or if POST requests are used (because the Wayback Machine [14:15] uses POST for its own purposes). [14:16] But yeah, WARC used with HTTP is simply a collection of request/response pairs. The payload (HTTP headers and body) are completely irrelevant from WARC's perspective. [14:16] (WARC at its core is even more versatile than that.) [14:26] *** bitBaron has quit IRC (Quit: My computer has gone to sleep. ZZZzzz…) [14:27] *** Stilett0 has joined #archiveteam-bs [14:28] *** bitBaron has joined #archiveteam-bs [14:30] *** bitBaron has quit IRC (Client Quit) [15:24] *** Mateon1 has quit IRC (Ping timeout: 260 seconds) [15:24] *** Mateon1 has joined #archiveteam-bs [15:34] I was reading and brozzler is more fit for it [15:35] I think.. how else would it save the [15:35] stuff loaded lazyly by js [15:37] is there anyway after capturing the page to make it point the sample image to the full version and delete the sample? cause the samples would be a real waste of space [15:41] the sample is ID_pX_master1200.jpg and the full is ID_pX.jpg where ID is post ID and X is image on the post [15:41] X starting from 0 [15:44] *** Pixi has quit IRC (Read error: Operation timed out) [15:46] *** Pixi has joined #archiveteam-bs [16:20] *** bitBaron has joined #archiveteam-bs [16:28] *** bitBaron has quit IRC (Ping timeout: 506 seconds) [16:45] *** Stilett0 has quit IRC (Read error: Operation timed out) [16:45] *** Stilett0 has joined #archiveteam-bs [16:54] *** Stilett0 has quit IRC (Read error: Operation timed out) [17:23] fredgido if this is about pixiv take it to #savepixiv [18:37] *** JAA has quit IRC (Read error: Operation timed out) [18:38] *** zyphlar has quit IRC (Read error: Operation timed out) [18:38] *** wabu has quit IRC (Read error: Operation timed out) [18:39] *** ivan has quit IRC (Read error: Operation timed out) [18:39] *** Petri152 has quit IRC (Ping timeout: 246 seconds) [18:41] *** jspiros has quit IRC (Read error: Operation timed out) [18:45] *** squires has quit IRC (Ping timeout: 600 seconds) [18:45] *** sep332 has quit IRC (Read error: Operation timed out) [18:46] *** ivan has joined #archiveteam-bs [18:50] *** Dimtree has quit IRC (Ping timeout: 840 seconds) [18:51] *** Mateon1 has quit IRC (west.us.hub irc.Prison.NET) [18:51] *** achip has quit IRC (west.us.hub irc.Prison.NET) [18:51] *** robogoat has quit IRC (west.us.hub irc.Prison.NET) [18:52] *** kiska has quit IRC (Ping timeout: 600 seconds) [19:04] *** squires has joined #archiveteam-bs [19:11] *** REiN^ has quit IRC (Ping timeout: 600 seconds) [19:11] *** squires has quit IRC (Read error: Connection reset by peer) [19:13] *** Mateon1 has joined #archiveteam-bs [19:13] *** achip has joined #archiveteam-bs [19:13] *** robogoat has joined #archiveteam-bs [19:13] *** sep332 has joined #archiveteam-bs [19:16] *** REiN^ has joined #archiveteam-bs [19:24] *** squires has joined #archiveteam-bs [19:29] *** Stilett0 has joined #archiveteam-bs [19:29] *** beardicus has quit IRC (Read error: Operation timed out) [19:29] *** tyzoid has quit IRC (Read error: Operation timed out) [19:30] *** kiska has joined #archiveteam-bs [19:31] *** tyzoid has joined #archiveteam-bs [19:32] *** beardicus has joined #archiveteam-bs [19:35] *** Dimtree has joined #archiveteam-bs [19:38] *** jspiros has joined #archiveteam-bs [19:39] *** zyphlar has joined #archiveteam-bs [19:39] *** wabu has joined #archiveteam-bs [19:40] *** Petri152 has joined #archiveteam-bs [19:42] *** JAA has joined #archiveteam-bs [19:42] *** swebb sets mode: +o JAA [19:42] *** bakJAA sets mode: +o JAA [21:05] *** SketchCow has joined #archiveteam-bs [21:05] *** swebb sets mode: +o SketchCow [21:12] *** Stiletto has joined #archiveteam-bs [21:13] *** Stilett0 has quit IRC (Ping timeout: 360 seconds) [21:19] *** Stilett0 has joined #archiveteam-bs [21:22] *** BartoCH has quit IRC (Quit: WeeChat 2.2) [21:23] *** Stiletto has quit IRC (Read error: Operation timed out) [21:37] *** BartoCH has joined #archiveteam-bs [22:20] *** bitBaron has joined #archiveteam-bs [22:35] *** bitBaron has quit IRC (My computer has gone to sleep. ZZZzzz…) [22:35] *** ivan has quit IRC (Read error: Connection reset by peer) [22:37] *** Petri152 has quit IRC (Ping timeout: 246 seconds) [22:38] *** PotcFdk has quit IRC (Read error: Operation timed out) [22:39] *** beardicus has quit IRC (Read error: Operation timed out) [22:40] *** JAA has quit IRC (Ping timeout: 246 seconds) [22:40] *** wabu has quit IRC (Ping timeout: 246 seconds) [22:40] *** zyphlar has quit IRC (Ping timeout: 246 seconds) [22:40] *** tyzoid has quit IRC (Read error: Operation timed out) [22:41] *** tyzoid has joined #archiveteam-bs [22:42] *** ivan has joined #archiveteam-bs [22:42] *** beardicus has joined #archiveteam-bs [22:44] *** jspiros has quit IRC (Ping timeout: 492 seconds) [22:44] *** Mateon1 has quit IRC (west.us.hub irc.Prison.NET) [22:44] *** achip has quit IRC (west.us.hub irc.Prison.NET) [22:44] *** robogoat has quit IRC (west.us.hub irc.Prison.NET) [22:47] *** Dimtree has quit IRC (Ping timeout: 840 seconds) [22:50] *** Dimtree has joined #archiveteam-bs [22:50] *** beardicus has quit IRC (Read error: Operation timed out) [22:51] *** beardicus has joined #archiveteam-bs [22:53] *** kiska has quit IRC (Ping timeout: 600 seconds) [22:54] *** tyzoid has quit IRC (Ping timeout: 600 seconds) [22:56] *** kiska has joined #archiveteam-bs [22:56] *** tyzoid has joined #archiveteam-bs [22:59] *** REiN^ has quit IRC (Ping timeout: 600 seconds) [23:01] *** Mateon1 has joined #archiveteam-bs [23:01] *** achip has joined #archiveteam-bs [23:01] *** robogoat has joined #archiveteam-bs [23:03] *** Mateon1 has quit IRC (west.us.hub irc.Prison.NET) [23:03] *** achip has quit IRC (west.us.hub irc.Prison.NET) [23:03] *** robogoat has quit IRC (west.us.hub irc.Prison.NET) [23:09] *** Mateon1 has joined #archiveteam-bs [23:09] *** achip has joined #archiveteam-bs [23:09] *** robogoat has joined #archiveteam-bs [23:10] *** beardicus has quit IRC (Read error: Operation timed out) [23:11] *** beardicus has joined #archiveteam-bs [23:12] *** kiska has quit IRC (Ping timeout: 600 seconds) [23:13] *** kiska has joined #archiveteam-bs [23:14] *** Mateon1 has quit IRC (west.us.hub irc.Prison.NET) [23:14] *** achip has quit IRC (west.us.hub irc.Prison.NET) [23:14] *** robogoat has quit IRC (west.us.hub irc.Prison.NET) [23:17] *** sep332 has quit IRC (Ping timeout: 601 seconds) [23:17] *** REiN^ has joined #archiveteam-bs [23:21] *** squires has quit IRC (Read error: Connection reset by peer) [23:24] *** Mateon1 has joined #archiveteam-bs [23:24] *** achip has joined #archiveteam-bs [23:24] *** robogoat has joined #archiveteam-bs [23:26] *** PotcFdk has joined #archiveteam-bs [23:33] *** squires has joined #archiveteam-bs [23:34] *** sep332 has joined #archiveteam-bs [23:37] *** JAA has joined #archiveteam-bs [23:37] *** swebb sets mode: +o JAA [23:37] *** bakJAA sets mode: +o JAA [23:37] *** zyphlar has joined #archiveteam-bs [23:38] *** Petri152 has joined #archiveteam-bs [23:38] *** jspiros has joined #archiveteam-bs [23:40] *** wabu has joined #archiveteam-bs [23:48] https://www.youtube.com/user/thegavin2000 Someone might wanna throw this into tubeup