#archiveteam-bs 2018-08-11,Sat

↑back Search

Time Nickname Message
00:09 🔗 SmileyG has joined #archiveteam-bs
00:09 🔗 Smiley has quit IRC (Read error: Connection reset by peer)
00:10 🔗 DragonMon has quit IRC (Quit: Leaving)
00:10 🔗 JH88 has quit IRC (Read error: Connection reset by peer)
00:10 🔗 BlueMaxim has joined #archiveteam-bs
00:10 🔗 JH88 has joined #archiveteam-bs
00:11 🔗 JH88 has quit IRC (Read error: Connection reset by peer)
00:12 🔗 JH88 has joined #archiveteam-bs
00:12 🔗 vectr0n_ has joined #archiveteam-bs
00:12 🔗 m007a83__ has joined #archiveteam-bs
00:13 🔗 c4rc4s has quit IRC (Ping timeout: 600 seconds)
00:14 🔗 c4rc4s has joined #archiveteam-bs
00:14 🔗 BlueMax has quit IRC (Read error: Operation timed out)
00:16 🔗 m007a83 has quit IRC (Read error: Operation timed out)
00:20 🔗 wp494 has quit IRC (Remote host closed the connection)
00:20 🔗 vectr0n has quit IRC (Ping timeout: 600 seconds)
00:20 🔗 vectr0n_ is now known as vectr0n
00:20 🔗 wp494 has joined #archiveteam-bs
00:21 🔗 kiska has quit IRC (Read error: Connection reset by peer)
00:21 🔗 squires has quit IRC (Read error: Connection reset by peer)
00:22 🔗 tyzoid has quit IRC (Ping timeout: 600 seconds)
00:24 🔗 tyzoid has joined #archiveteam-bs
00:24 🔗 godane has quit IRC (Read error: Operation timed out)
00:25 🔗 godane has joined #archiveteam-bs
00:26 🔗 ivan has quit IRC (Read error: Connection reset by peer)
00:26 🔗 svchfoo1 sets mode: +o godane
00:26 🔗 JAA has quit IRC (Read error: Operation timed out)
00:27 🔗 jspiros has quit IRC (Read error: Operation timed out)
00:27 🔗 zyphlar has quit IRC (Read error: Operation timed out)
00:27 🔗 wabu has quit IRC (Read error: Operation timed out)
00:27 🔗 Petri152 has quit IRC (Read error: Operation timed out)
00:28 🔗 chfoo has quit IRC (Read error: Operation timed out)
00:28 🔗 beardicus has quit IRC (Read error: Operation timed out)
00:28 🔗 ivan has joined #archiveteam-bs
00:29 🔗 kiska has joined #archiveteam-bs
00:29 🔗 beardicus has joined #archiveteam-bs
00:29 🔗 odemg has quit IRC (Read error: Operation timed out)
00:29 🔗 odemg has joined #archiveteam-bs
00:32 🔗 wp494 has quit IRC (Read error: Operation timed out)
00:33 🔗 adinbied has quit IRC (Ping timeout: 255 seconds)
00:33 🔗 mundus20- has quit IRC (Ping timeout: 255 seconds)
00:33 🔗 squires has joined #archiveteam-bs
00:33 🔗 wp494 has joined #archiveteam-bs
00:34 🔗 achip has quit IRC (west.us.hub irc.Prison.NET)
00:34 🔗 robogoat has quit IRC (west.us.hub irc.Prison.NET)
00:34 🔗 mundus201 has joined #archiveteam-bs
00:35 🔗 tyzoid has quit IRC (Read error: Operation timed out)
00:35 🔗 tyzoid has joined #archiveteam-bs
00:36 🔗 chfoo has joined #archiveteam-bs
00:36 🔗 adinbied has joined #archiveteam-bs
00:38 🔗 beardicus has quit IRC (Read error: Operation timed out)
00:39 🔗 beardicus has joined #archiveteam-bs
00:41 🔗 ivan has quit IRC (Read error: Operation timed out)
00:43 🔗 balrog has quit IRC (Read error: Operation timed out)
00:44 🔗 ivan has joined #archiveteam-bs
00:46 🔗 balrog has joined #archiveteam-bs
00:46 🔗 swebb sets mode: +o balrog
00:51 🔗 Dimtree has quit IRC (Read error: Operation timed out)
00:53 🔗 beardicus has quit IRC (Read error: Operation timed out)
00:55 🔗 achip has joined #archiveteam-bs
00:55 🔗 robogoat has joined #archiveteam-bs
00:56 🔗 beardicus has joined #archiveteam-bs
00:56 🔗 eLbot has quit IRC (Ping timeout: 600 seconds)
00:57 🔗 eLbot has joined #archiveteam-bs
00:59 🔗 squires has quit IRC (Ping timeout: 600 seconds)
01:01 🔗 beardicus has quit IRC (Read error: Operation timed out)
01:02 🔗 adinbied has quit IRC (Read error: Operation timed out)
01:04 🔗 beardicus has joined #archiveteam-bs
01:04 🔗 adinbied has joined #archiveteam-bs
01:05 🔗 PotcFdk has quit IRC (Ping timeout: 600 seconds)
01:08 🔗 Dimtree has joined #archiveteam-bs
01:09 🔗 squires has joined #archiveteam-bs
01:10 🔗 REiN^ has quit IRC (Ping timeout: 600 seconds)
01:10 🔗 ivan has quit IRC (Ping timeout: 600 seconds)
01:12 🔗 fredgido about pixiv on the other channel
01:12 🔗 fredgido url is https://www.pixiv.net/member_illust.php?mode=medium&illust_id=ID_POST
01:13 🔗 fredgido if space is problem nearly a third are png and could be losslessly more compressed
01:13 🔗 tyzoid has quit IRC (Read error: Operation timed out)
01:14 🔗 beardicus has quit IRC (Read error: Operation timed out)
01:14 🔗 tyzoid has joined #archiveteam-bs
01:15 🔗 kiska From #archiveteam
01:16 🔗 fredgido would save about 20% if recompress is 50% effective, this is full resulutio + json of the post from the api
01:16 🔗 kiska [2018-08-11 01:08:50] <fredgido> I extrapolate that every year 1664k posts on pixiv are deleted is this OK? that is 2% of its website every year (it grew 8.9% last year)
01:16 🔗 kiska [2018-08-11 01:08:50] <fredgido> acording to my calculations there are 80 to 100TB, it should be a little smaller because I made generous assumptions, it considers only 20% are deleted and ignores that 10 years ago uploads were smaller in size
01:16 🔗 kiska [2018-08-11 01:08:50] <fredgido> is this too big?
01:16 🔗 kiska [2018-08-11 01:11:49] <Flashfire> Too big for archivebot depending on how the urls are formatted a warrior project could be possible
01:16 🔗 kiska [2018-08-11 01:11:49] <astrid> 100T is a fair chunk, we'd need signoff from archive.org that they want it
01:17 🔗 fredgido I can share a spreedsheet with the math on the size if anyone cares
01:18 🔗 kiska When grabbing warc's of a site, we tend to grab full sized images, and not recompress since we don't modify warcs in anyway
01:19 🔗 fredgido yeah but you need to click to get the full resolution, you get a 1200 maximum height if not
01:19 🔗 fredgido I dont know that much about warc sorry
01:20 🔗 ivan has joined #archiveteam-bs
01:23 🔗 fredgido there is python tool that uses the api and can download by id, you just need to go ascending by ID from 0 to 70 000k but from what I read you guys mostly do full pages
01:23 🔗 fredgido ID ranges could be split
01:24 🔗 kiskaBak Have a read of https://archiveteam.org/index.php?title=The_WARC_Ecosystem
01:24 🔗 kiskaBak Since it looks like kiska is having connection issues I'll be here
01:26 🔗 fredgido thanks I will read as I am interested besides using it for this
01:26 🔗 squires has quit IRC (Ping timeout: 600 seconds)
01:26 🔗 REiN^ has joined #archiveteam-bs
01:26 🔗 kiska has quit IRC (Remote host closed the connection)
01:28 🔗 wabu has joined #archiveteam-bs
01:28 🔗 zyphlar has joined #archiveteam-bs
01:28 🔗 Petri152 has joined #archiveteam-bs
01:28 🔗 JAA has joined #archiveteam-bs
01:28 🔗 swebb sets mode: +o JAA
01:28 🔗 bakJAA sets mode: +o JAA
01:29 🔗 Flashfire We would need confirmation that IA wants this but it would be a good warrior project
01:30 🔗 ivan has quit IRC (Ping timeout: 600 seconds)
01:31 🔗 ivan has joined #archiveteam-bs
01:31 🔗 jspiros has joined #archiveteam-bs
01:33 🔗 beardicus has joined #archiveteam-bs
01:34 🔗 squires has joined #archiveteam-bs
01:34 🔗 kiska has joined #archiveteam-bs
01:39 🔗 fredgido I have a question, how does WARC well does it handle dynamic loading content?
01:39 🔗 fredgido comments load on click
01:39 🔗 PotcFdk has joined #archiveteam-bs
01:43 🔗 kiska I'll redirect this to JAA
01:43 🔗 fredgido thanks you
01:45 🔗 fredgido I will leave this here that is what I think is the best tool for everything except the comments https://github.com/Nandaka/PixivUtil2
01:49 🔗 fredgido here is the extrapolation of the size based on my sample and on danbooru's sample https://docs.google.com/spreadsheets/d/13zPMdRlBL_0IxK5nXteMmWqkkAIJxo8823iUaAAmw_k/
01:55 🔗 fredgido I could write a comment scraper or dumper using the api, besides that the json gives whatelse there is needed to be saved, full post image, name of the image, tool needed, description, tags, number of likes,views and favs
02:01 🔗 m007a83__ is now known as m007a83
03:02 🔗 godane latest scan : https://archive.org/details/star-trek-communicator-issue-133
03:03 🔗 godane latest scan : https://archive.org/details/star-trek-communicator-issue-137
03:04 🔗 godane latest scan : https://archive.org/details/star-trek-communicator-issue-133-postcards
03:27 🔗 arkiver We already did something for pixiv before https://archiveteam.org/index.php?title=Pixiv_Chat
03:27 🔗 arkiver channel for pixiv: #savepixiv
03:55 🔗 archodg_ has joined #archiveteam-bs
03:57 🔗 archodg__ has quit IRC (Ping timeout: 252 seconds)
03:58 🔗 odemg has quit IRC (Ping timeout: 260 seconds)
04:10 🔗 odemg has joined #archiveteam-bs
05:14 🔗 fredgido that is the chat thing thankfully saved, the pics are too many to start saving when needed
05:19 🔗 archodg__ has joined #archiveteam-bs
05:22 🔗 archodg_ has quit IRC (Ping timeout: 252 seconds)
05:38 🔗 godane SketchCow: so i got a odd tape with 'spy cam' footage and a brian wilson documentary on disney channel
05:39 🔗 godane the spy cam footage is from november 3, 2002
05:39 🔗 godane the disney channel stuff is from 1990
05:39 🔗 godane its very odd tape
05:42 🔗 godane it was basically a guy washing his car
05:44 🔗 godane i think this was a copy of something importiant
05:44 🔗 godane anyways i'm uploading it to FOS
05:47 🔗 c4rc4s has quit IRC (Ping timeout: 360 seconds)
06:03 🔗 c4rc4s has joined #archiveteam-bs
06:37 🔗 superkuh has quit IRC (Quit: the neuronal action potential is an electrical manipulation of reversible abrupt phase changes in the lipid bilaye)
06:47 🔗 superkuh has joined #archiveteam-bs
07:03 🔗 aleph has quit IRC (Quit: WeeChat 1.6)
07:22 🔗 PurpleSym Hm, my IA account @purplesymphony was locked. Can anyone check why?
07:22 🔗 PurpleSym (Should be available here: https://archive.org/history/@purplesymphony)
08:09 🔗 BartoCH has quit IRC (Quit: WeeChat 2.2)
08:24 🔗 BartoCH has joined #archiveteam-bs
08:53 🔗 signius has joined #archiveteam-bs
08:56 🔗 signius has quit IRC (Client Quit)
08:56 🔗 signius has joined #archiveteam-bs
09:30 🔗 username1 has joined #archiveteam-bs
09:33 🔗 schbirid2 has quit IRC (Read error: Operation timed out)
09:33 🔗 BartoCH has quit IRC (Quit: WeeChat 2.2)
09:48 🔗 BartoCH has joined #archiveteam-bs
09:54 🔗 wp494 has quit IRC (Read error: Operation timed out)
10:28 🔗 Aoede those alex jones videos I mentioned earlier
10:29 🔗 Aoede I have 160GB of them. anyone want to take them?
10:36 🔗 wp494 has joined #archiveteam-bs
11:31 🔗 SketchCow has quit IRC (Ping timeout: 260 seconds)
12:01 🔗 eientei95 Not with a 10 foot pole
13:24 🔗 BlueMaxim has quit IRC (Quit: Leaving)
13:56 🔗 bitBaron has joined #archiveteam-bs
14:15 🔗 JAA fredgido: WARC itself can handle anything. It's mostly the retrieval and playback that messes with this. Retrieval since you typically need an interactive browser (or at least a way to simulate the same behaviour). Playback is difficult if URLs include non-deterministic values (e.g. timestamps, platform-dependent parameters, random strings) or if POST requests are used (because the Wayback Machine
14:15 🔗 JAA uses POST for its own purposes).
14:16 🔗 JAA But yeah, WARC used with HTTP is simply a collection of request/response pairs. The payload (HTTP headers and body) are completely irrelevant from WARC's perspective.
14:16 🔗 JAA (WARC at its core is even more versatile than that.)
14:26 🔗 bitBaron has quit IRC (Quit: My computer has gone to sleep. ZZZzzz…)
14:27 🔗 Stilett0 has joined #archiveteam-bs
14:28 🔗 bitBaron has joined #archiveteam-bs
14:30 🔗 bitBaron has quit IRC (Client Quit)
15:24 🔗 Mateon1 has quit IRC (Ping timeout: 260 seconds)
15:24 🔗 Mateon1 has joined #archiveteam-bs
15:34 🔗 fredgido I was reading and brozzler is more fit for it
15:35 🔗 fredgido I think.. how else would it save the
15:35 🔗 fredgido stuff loaded lazyly by js
15:37 🔗 fredgido is there anyway after capturing the page to make it point the sample image to the full version and delete the sample? cause the samples would be a real waste of space
15:41 🔗 fredgido the sample is ID_pX_master1200.jpg and the full is ID_pX.jpg where ID is post ID and X is image on the post
15:41 🔗 fredgido X starting from 0
15:44 🔗 Pixi has quit IRC (Read error: Operation timed out)
15:46 🔗 Pixi has joined #archiveteam-bs
16:20 🔗 bitBaron has joined #archiveteam-bs
16:28 🔗 bitBaron has quit IRC (Ping timeout: 506 seconds)
16:45 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
16:45 🔗 Stilett0 has joined #archiveteam-bs
16:54 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
17:23 🔗 arkiver fredgido if this is about pixiv take it to #savepixiv
18:37 🔗 JAA has quit IRC (Read error: Operation timed out)
18:38 🔗 zyphlar has quit IRC (Read error: Operation timed out)
18:38 🔗 wabu has quit IRC (Read error: Operation timed out)
18:39 🔗 ivan has quit IRC (Read error: Operation timed out)
18:39 🔗 Petri152 has quit IRC (Ping timeout: 246 seconds)
18:41 🔗 jspiros has quit IRC (Read error: Operation timed out)
18:45 🔗 squires has quit IRC (Ping timeout: 600 seconds)
18:45 🔗 sep332 has quit IRC (Read error: Operation timed out)
18:46 🔗 ivan has joined #archiveteam-bs
18:50 🔗 Dimtree has quit IRC (Ping timeout: 840 seconds)
18:51 🔗 Mateon1 has quit IRC (west.us.hub irc.Prison.NET)
18:51 🔗 achip has quit IRC (west.us.hub irc.Prison.NET)
18:51 🔗 robogoat has quit IRC (west.us.hub irc.Prison.NET)
18:52 🔗 kiska has quit IRC (Ping timeout: 600 seconds)
19:04 🔗 squires has joined #archiveteam-bs
19:11 🔗 REiN^ has quit IRC (Ping timeout: 600 seconds)
19:11 🔗 squires has quit IRC (Read error: Connection reset by peer)
19:13 🔗 Mateon1 has joined #archiveteam-bs
19:13 🔗 achip has joined #archiveteam-bs
19:13 🔗 robogoat has joined #archiveteam-bs
19:13 🔗 sep332 has joined #archiveteam-bs
19:16 🔗 REiN^ has joined #archiveteam-bs
19:24 🔗 squires has joined #archiveteam-bs
19:29 🔗 Stilett0 has joined #archiveteam-bs
19:29 🔗 beardicus has quit IRC (Read error: Operation timed out)
19:29 🔗 tyzoid has quit IRC (Read error: Operation timed out)
19:30 🔗 kiska has joined #archiveteam-bs
19:31 🔗 tyzoid has joined #archiveteam-bs
19:32 🔗 beardicus has joined #archiveteam-bs
19:35 🔗 Dimtree has joined #archiveteam-bs
19:38 🔗 jspiros has joined #archiveteam-bs
19:39 🔗 zyphlar has joined #archiveteam-bs
19:39 🔗 wabu has joined #archiveteam-bs
19:40 🔗 Petri152 has joined #archiveteam-bs
19:42 🔗 JAA has joined #archiveteam-bs
19:42 🔗 swebb sets mode: +o JAA
19:42 🔗 bakJAA sets mode: +o JAA
21:05 🔗 SketchCow has joined #archiveteam-bs
21:05 🔗 swebb sets mode: +o SketchCow
21:12 🔗 Stiletto has joined #archiveteam-bs
21:13 🔗 Stilett0 has quit IRC (Ping timeout: 360 seconds)
21:19 🔗 Stilett0 has joined #archiveteam-bs
21:22 🔗 BartoCH has quit IRC (Quit: WeeChat 2.2)
21:23 🔗 Stiletto has quit IRC (Read error: Operation timed out)
21:37 🔗 BartoCH has joined #archiveteam-bs
22:20 🔗 bitBaron has joined #archiveteam-bs
22:35 🔗 bitBaron has quit IRC (My computer has gone to sleep. ZZZzzz…)
22:35 🔗 ivan has quit IRC (Read error: Connection reset by peer)
22:37 🔗 Petri152 has quit IRC (Ping timeout: 246 seconds)
22:38 🔗 PotcFdk has quit IRC (Read error: Operation timed out)
22:39 🔗 beardicus has quit IRC (Read error: Operation timed out)
22:40 🔗 JAA has quit IRC (Ping timeout: 246 seconds)
22:40 🔗 wabu has quit IRC (Ping timeout: 246 seconds)
22:40 🔗 zyphlar has quit IRC (Ping timeout: 246 seconds)
22:40 🔗 tyzoid has quit IRC (Read error: Operation timed out)
22:41 🔗 tyzoid has joined #archiveteam-bs
22:42 🔗 ivan has joined #archiveteam-bs
22:42 🔗 beardicus has joined #archiveteam-bs
22:44 🔗 jspiros has quit IRC (Ping timeout: 492 seconds)
22:44 🔗 Mateon1 has quit IRC (west.us.hub irc.Prison.NET)
22:44 🔗 achip has quit IRC (west.us.hub irc.Prison.NET)
22:44 🔗 robogoat has quit IRC (west.us.hub irc.Prison.NET)
22:47 🔗 Dimtree has quit IRC (Ping timeout: 840 seconds)
22:50 🔗 Dimtree has joined #archiveteam-bs
22:50 🔗 beardicus has quit IRC (Read error: Operation timed out)
22:51 🔗 beardicus has joined #archiveteam-bs
22:53 🔗 kiska has quit IRC (Ping timeout: 600 seconds)
22:54 🔗 tyzoid has quit IRC (Ping timeout: 600 seconds)
22:56 🔗 kiska has joined #archiveteam-bs
22:56 🔗 tyzoid has joined #archiveteam-bs
22:59 🔗 REiN^ has quit IRC (Ping timeout: 600 seconds)
23:01 🔗 Mateon1 has joined #archiveteam-bs
23:01 🔗 achip has joined #archiveteam-bs
23:01 🔗 robogoat has joined #archiveteam-bs
23:03 🔗 Mateon1 has quit IRC (west.us.hub irc.Prison.NET)
23:03 🔗 achip has quit IRC (west.us.hub irc.Prison.NET)
23:03 🔗 robogoat has quit IRC (west.us.hub irc.Prison.NET)
23:09 🔗 Mateon1 has joined #archiveteam-bs
23:09 🔗 achip has joined #archiveteam-bs
23:09 🔗 robogoat has joined #archiveteam-bs
23:10 🔗 beardicus has quit IRC (Read error: Operation timed out)
23:11 🔗 beardicus has joined #archiveteam-bs
23:12 🔗 kiska has quit IRC (Ping timeout: 600 seconds)
23:13 🔗 kiska has joined #archiveteam-bs
23:14 🔗 Mateon1 has quit IRC (west.us.hub irc.Prison.NET)
23:14 🔗 achip has quit IRC (west.us.hub irc.Prison.NET)
23:14 🔗 robogoat has quit IRC (west.us.hub irc.Prison.NET)
23:17 🔗 sep332 has quit IRC (Ping timeout: 601 seconds)
23:17 🔗 REiN^ has joined #archiveteam-bs
23:21 🔗 squires has quit IRC (Read error: Connection reset by peer)
23:24 🔗 Mateon1 has joined #archiveteam-bs
23:24 🔗 achip has joined #archiveteam-bs
23:24 🔗 robogoat has joined #archiveteam-bs
23:26 🔗 PotcFdk has joined #archiveteam-bs
23:33 🔗 squires has joined #archiveteam-bs
23:34 🔗 sep332 has joined #archiveteam-bs
23:37 🔗 JAA has joined #archiveteam-bs
23:37 🔗 swebb sets mode: +o JAA
23:37 🔗 bakJAA sets mode: +o JAA
23:37 🔗 zyphlar has joined #archiveteam-bs
23:38 🔗 Petri152 has joined #archiveteam-bs
23:38 🔗 jspiros has joined #archiveteam-bs
23:40 🔗 wabu has joined #archiveteam-bs
23:48 🔗 Flashfire https://www.youtube.com/user/thegavin2000 Someone might wanna throw this into tubeup

irclogger-viewer