#archiveteam-bs 2017-07-10,Mon

↑back Search

Time Nickname Message
00:27 🔗 BlueMaxim has joined #archiveteam-bs
00:32 🔗 bsmith093 has joined #archiveteam-bs
01:44 🔗 pizzaiolo has quit IRC (Read error: Operation timed out)
01:44 🔗 pizzaiolo has joined #archiveteam-bs
01:56 🔗 dashcloud has quit IRC (Read error: Operation timed out)
02:01 🔗 dashcloud has joined #archiveteam-bs
02:03 🔗 RichardG_ is now known as RichardG
02:08 🔗 BubuAnabe has joined #archiveteam-bs
02:11 🔗 bitBaron has quit IRC (Quit: My computer has gone to sleep. ZZZzzz…)
02:17 🔗 pizzaiolo has quit IRC (Ping timeout: 268 seconds)
02:17 🔗 pizzaiolo has joined #archiveteam-bs
02:55 🔗 tsp_ has quit IRC (Read error: Connection reset by peer)
02:59 🔗 dashcloud has quit IRC (Read error: Operation timed out)
03:00 🔗 dashcloud has joined #archiveteam-bs
03:22 🔗 qw3rty has joined #archiveteam-bs
03:22 🔗 ZexaronS has quit IRC (Leaving)
03:25 🔗 qw3rty2 has quit IRC (Read error: Operation timed out)
03:38 🔗 ZexaronS has joined #archiveteam-bs
04:06 🔗 Sk1d has quit IRC (Ping timeout: 194 seconds)
04:12 🔗 Sk1d has joined #archiveteam-bs
04:26 🔗 godane has quit IRC (Read error: Operation timed out)
04:30 🔗 godane has joined #archiveteam-bs
05:09 🔗 j08nY has joined #archiveteam-bs
05:34 🔗 Fletcher_ has joined #archiveteam-bs
05:36 🔗 j08nY has quit IRC (Read error: Operation timed out)
06:21 🔗 Yoshimura has quit IRC (Ping timeout: 260 seconds)
06:22 🔗 Yoshimura has joined #archiveteam-bs
06:27 🔗 godane code i could use to grab libgen.pw pdfs : curl -s https://libgen.pw/download.php?id=184042 | grep 'img src=' | sed 's|%3D|=|g' | sed 's|%3F|?|g' | sed 's|%2F|/|g' | sed 's|%3A|:|g' | sed 's|%26|\&|g' | sed 's|.*chl=||g' | sed 's|pdf.*|pdf|g'
06:36 🔗 godane better code: curl -s https://libgen.pw/download.php?id=184042 | grep 'img src=' | sed 's|.*chl=||g' | sed 's|.*libgen.pw|http://libgen.pw|g' | sed 's|pdf.*|pdf|g' | sed 's|%2F|/|g' | sed 's| |_|g' | sed 's|%26|&|g' | sed 's|%3D|=|g' | sed 's|%3F|?|g' | sed 's|%26|\&|g'
07:39 🔗 brayden has quit IRC (Read error: Connection reset by peer)
07:39 🔗 brayden has joined #archiveteam-bs
07:39 🔗 swebb sets mode: +o brayden
07:50 🔗 Jonison has joined #archiveteam-bs
08:16 🔗 j08nY has joined #archiveteam-bs
08:46 🔗 icedice has joined #archiveteam-bs
09:53 🔗 SHODAN_UI has joined #archiveteam-bs
10:30 🔗 SHODAN_UI has quit IRC (Remote host closed the connection)
12:02 🔗 pizzaiolo has quit IRC (Remote host closed the connection)
12:28 🔗 BlueMaxim has quit IRC (Read error: Operation timed out)
12:35 🔗 SHODAN_UI has joined #archiveteam-bs
12:40 🔗 Mateon1 has joined #archiveteam-bs
12:42 🔗 pizzaiolo has joined #archiveteam-bs
12:44 🔗 j08nY has quit IRC (Read error: Operation timed out)
12:44 🔗 dashcloud has quit IRC (Remote host closed the connection)
12:46 🔗 dashcloud has joined #archiveteam-bs
13:25 🔗 eccfill has quit IRC (Read error: Connection reset by peer)
13:26 🔗 eccfill has joined #archiveteam-bs
13:31 🔗 Yurume has quit IRC (Read error: Operation timed out)
13:35 🔗 Yurume has joined #archiveteam-bs
13:43 🔗 bitBaron has joined #archiveteam-bs
13:55 🔗 icedice has quit IRC (Quit: Leaving)
14:13 🔗 bitBaron has quit IRC (Quit: My computer has gone to sleep. ZZZzzz…)
14:14 🔗 j08nY has joined #archiveteam-bs
14:40 🔗 SHODAN_UI has quit IRC (Remote host closed the connection)
15:33 🔗 ZexaronS has quit IRC (Quit: Leaving)
15:37 🔗 Jonison has quit IRC (Quit: Leaving)
15:56 🔗 dashcloud has quit IRC (Read error: Operation timed out)
16:00 🔗 dashcloud has joined #archiveteam-bs
16:03 🔗 odemg godane, first line output url with spaces, second didnt get anything
16:32 🔗 bitBaron has joined #archiveteam-bs
16:32 🔗 bitBaron has quit IRC (Client Quit)
16:34 🔗 Stiletto has quit IRC (Read error: Operation timed out)
16:34 🔗 bitBaron has joined #archiveteam-bs
16:43 🔗 Famicoman has quit IRC (Ping timeout: 260 seconds)
16:53 🔗 eccfill godane: look into djvu for magazine compression
16:57 🔗 BubuAnabe has quit IRC (Ping timeout: 268 seconds)
16:58 🔗 BubuAnabe has joined #archiveteam-bs
17:16 🔗 Odd0002 has quit IRC (Read error: Operation timed out)
17:27 🔗 bitBaron has quit IRC (Quit: My computer has gone to sleep. ZZZzzz…)
17:28 🔗 bitBaron has joined #archiveteam-bs
17:30 🔗 bitBaron has quit IRC (Client Quit)
17:32 🔗 bitBaron has joined #archiveteam-bs
17:32 🔗 bitBaron has quit IRC (Read error: Connection reset by peer)
17:33 🔗 bitBaron has joined #archiveteam-bs
17:42 🔗 Sanqui CLICK for Photos 📷
17:48 🔗 Jonison has joined #archiveteam-bs
18:19 🔗 dashcloud has quit IRC (Read error: Operation timed out)
18:19 🔗 dashcloud has joined #archiveteam-bs
18:33 🔗 icedice has joined #archiveteam-bs
18:46 🔗 bitBaron has quit IRC (Read error: Connection reset by peer)
18:46 🔗 bitBaron_ has joined #archiveteam-bs
18:46 🔗 tuluu Hi, is 'grab-site' the appropriate way to backup a webpage and create an warc file?
18:55 🔗 dashcloud has quit IRC (Read error: Operation timed out)
18:59 🔗 dashcloud has joined #archiveteam-bs
19:03 🔗 Meroje for one page webrecorder is probably easier to use
19:16 🔗 SHODAN_UI has joined #archiveteam-bs
19:34 🔗 ivan tuluu: I use it but then I wrote it so I could be biased
19:35 🔗 ivan (is there a better self-serve crawling setup you can do? heritrix3? httrack with a proxy?)
19:38 🔗 JAA Chromium/Firefox headless with warcprox, in theory. Can't wait to try that out when Firefox headless is released in September.
19:38 🔗 ivan why not headless Chrome now?
19:39 🔗 ivan I mean, that sounds cool but it's not a "crawl this entire site and linked pages" solution
19:39 🔗 ivan maybe there's something on github that does that though
19:39 🔗 JAA Because exactly that. I want to script it to follow links etc. And based on a quick glance at the documentation, Chromium's headless mode isn't exactly powerful.
19:40 🔗 JAA It really looks like a programmatic interface to the rendering engine, basically.
19:40 🔗 JAA But not a full browser which supports extensions (for the link following part etc.).
19:41 🔗 ivan what's missing? you can do pretty much anything you'd be able to do from the devtools
19:41 🔗 JAA You can?
19:41 🔗 ivan yes
19:41 🔗 JAA As I said, I just quickly skimmed the documentation.
19:41 🔗 JAA If you can, sweet.
19:43 🔗 TheLovina has quit IRC (Read error: Connection reset by peer)
19:48 🔗 TheLovina has joined #archiveteam-bs
19:59 🔗 tuluu ivan: thanks
19:59 🔗 eccfill JAA: yeah, there's a repl mode
20:00 🔗 eccfill and chrome headless speaks devtools protocol, so the same things that remote chrome debuggers use are available
20:05 🔗 tuluu grab-site seems to work well. I will use that for now.
20:29 🔗 JAA I guess I'll play around with headless Chromium in the next weeks then.
20:29 🔗 Stilett0 has joined #archiveteam-bs
20:33 🔗 JAA I'll still wait for headless Firefox before trying to build anything real though. I'm curious to see how they stack up against each other, in particular regarding resource usage.
21:01 🔗 icedice has quit IRC (Quit: Leaving)
21:21 🔗 logchfoo3 starts logging #archiveteam-bs at Mon Jul 10 21:21:32 2017
21:21 🔗 logchfoo3 has joined #archiveteam-bs
21:22 🔗 powerArch has quit IRC (Remote host closed the connection)
21:29 🔗 Mateon1 has quit IRC (Quit: Mateon1)
21:31 🔗 SHODAN_UI has quit IRC (Remote host closed the connection)
21:33 🔗 powerArch has joined #archiveteam-bs
21:34 🔗 bwn_ is now known as bwn
21:34 🔗 LeG0ax is now known as Ing3b0rg
21:44 🔗 Famicoman has joined #archiveteam-bs
21:57 🔗 Jonison has quit IRC (Read error: Connection reset by peer)
22:11 🔗 purplebot has joined #archiveteam-bs
22:12 🔗 voidsta has joined #archiveteam-bs
22:13 🔗 i0npulse has joined #archiveteam-bs
22:37 🔗 Hecatz has joined #archiveteam-bs
22:41 🔗 JAA Argh, I hate wix.com sites. They're very annoying to archive as well.
22:55 🔗 kimmer has joined #archiveteam-bs
22:56 🔗 kimmer soo, there have been implemented rate lmits on the URL shortner services?
22:57 🔗 xmc most shorteners will tell scrapers to get stuffed if they ask for too many urls
22:57 🔗 xmc that's why more ip addresses is better
22:58 🔗 kimmer allright.. guess it was takeing a bit of a load on their servers if we just hammered them.
22:58 🔗 Odd0002 has joined #archiveteam-bs
23:07 🔗 dashcloud Wanted to make sure people were aware of https://twitter.com/LinkArchiver - if you follow the account, anytime you tweet a link, it gets archived on IA, and if you @ the account, it'll tweet back the archive link
23:07 🔗 agrecasci has joined #archiveteam-bs
23:11 🔗 JAA Nice. Would be even better if it also archived the tweet.
23:11 🔗 JAA But I guess that's a separate project.
23:11 🔗 arkiver yeah, awesome project
23:12 🔗 arkiver also a great way I think to get more people to know the wayback machine
23:20 🔗 arkiver joepie91: finally fixed :) https://github.com/jjjake/internetarchive/issues/63
23:21 🔗 arkiver (it should be at least)
23:35 🔗 j08nY has quit IRC (Quit: Leaving)
23:47 🔗 DFJustin oh looks like they fixed the bookreader for special character filenames at some point too
23:47 🔗 mundus201 What is the process for extracting all of the files from a .megawarc.warc.gz?
23:51 🔗 arkiver https://github.com/chfoo/warcat
23:52 🔗 arkiver mundus201 ^
23:54 🔗 mundus201 Gracias arkiver

irclogger-viewer