#archiveteam-bs 2017-09-18,Mon

↑back Search

Time Nickname Message
00:02 🔗 etudier has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…)
00:41 🔗 BlueMaxim has joined #archiveteam-bs
01:22 🔗 Soni has joined #archiveteam-bs
02:40 🔗 dashcloud has quit IRC (Read error: Operation timed out)
02:41 🔗 dashcloud has joined #archiveteam-bs
02:45 🔗 astrid if IA doesn't have a copy of libgen then i'll eat my hat
02:46 🔗 astrid if IA has failed to snag a copy of libgen then we should really reconsider our life choices
02:51 🔗 jrwr libgen?
03:02 🔗 astrid there was some discussion few hours ago
03:04 🔗 hook54321 jrwr: libgen = library genisis
03:04 🔗 SketchCow Woooo
03:06 🔗 hook54321 Pretty sure this is the official site: http://gen.lib.rus.ec/
03:06 🔗 hook54321 However this is the site listed on wikipedia: https://libgen.pw/
03:06 🔗 hook54321 oh wait, that's one of them. DuckDuckGo was only showing one.
03:08 🔗 hook54321 On a side note, whenever we email site owners asking them to cooperate with us, I recommend that we send it to another email address first to see if it ends up in spam. That happened with the owner of imgh.us.
03:12 🔗 second Did you guys archive the-eye.eu?
03:12 🔗 second It has a lot of data though...
03:21 🔗 hook54321 We did not
03:23 🔗 second Are there plans to do so?
03:24 🔗 hook54321 About how much space does it take up?
03:25 🔗 second Well just the MSDN dump is 2.7TB
03:25 🔗 hook54321 eh
03:25 🔗 second Another dump of comics from whenever (pretty much all the major studios) is about 3TB
03:25 🔗 second Rom collection, not sure, pretty big I assume
03:25 🔗 second Then there is the reddit rips they have
03:25 🔗 hook54321 There is much stopping someone from uploading it to archive.org, maybe a mirror.
03:25 🔗 hook54321 *isn't
03:26 🔗 second What happens if I upload stuff, would the archive just delete it?
03:26 🔗 second I think it should be archived but you'll need to wait til the copyright expires :/
03:27 🔗 mundus second, I have a copy of it
03:27 🔗 mundus It's onyl like 8TB
03:27 🔗 mundus But most is not legal content
03:27 🔗 second yes
03:27 🔗 mundus and if it was going to be mirrored, archivist would do it
03:27 🔗 second Only
03:27 🔗 second archivist is the one who owns it
03:28 🔗 hook54321 Disclaimer: Most of us are not employed by archive.org.
03:28 🔗 hook54321 From what I've heard however, they wait until a copyright holder sends them a notice.
03:28 🔗 mundus Yeah, if he wanted it on IA it would be on IA
03:28 🔗 astrid we don't talk about copyright in here, folks
03:28 🔗 astrid take it to #scared-shitless
03:28 🔗 hook54321 we don't?
03:28 🔗 Frogging haha
03:28 🔗 astrid or maybe /r/legaladvice
03:28 🔗 second #scared-shitless: Nick/channel is temporarily unavailable
03:29 🔗 astrid okay what's that tell you
03:29 🔗 hook54321 That there was a netsplit recently
03:31 🔗 hook54321 I searched for the word "copyright" in the logs: 475 matches in 213 files
03:34 🔗 hook54321 Lots of the stuff in the-eye appears to be porn. Still doesn't stop someone from attempting to upload it though.
03:39 🔗 second Didn't know that
03:39 🔗 second How is it "only" 8TB?
03:44 🔗 arkhive has joined #archiveteam-bs
04:25 🔗 pizzaiolo has quit IRC (Quit: pizzaiolo)
04:42 🔗 kim__ has quit IRC (Ping timeout: 246 seconds)
04:46 🔗 Sk1d has quit IRC (Ping timeout: 250 seconds)
04:53 🔗 Sk1d has joined #archiveteam-bs
04:54 🔗 Fletcher| Worth noting that IA standard procedure seems to be to dark an item instead of deleting it when a copyright claim is received
05:03 🔗 jrwr Correct
05:09 🔗 Somebody2 darking an item *may* mean that it's entirely gone, however. Or it may not. What it definitively means is that IA has ceased to *distribute* the item.
05:36 🔗 Asparagir has quit IRC (Asparagir)
06:15 🔗 Mateon1 has quit IRC (Remote host closed the connection)
06:15 🔗 Mateon1 has joined #archiveteam-bs
06:28 🔗 schbirid has joined #archiveteam-bs
06:46 🔗 robink has quit IRC (Ping timeout: 246 seconds)
06:46 🔗 robink has joined #archiveteam-bs
06:49 🔗 schbirid i threw medium.com into wpull and it OOMd :D
07:32 🔗 Honno has joined #archiveteam-bs
08:06 🔗 Jonison has joined #archiveteam-bs
08:24 🔗 BartoCH has joined #archiveteam-bs
08:42 🔗 Mateon1 has quit IRC (Ping timeout: 260 seconds)
08:42 🔗 Mateon1 has joined #archiveteam-bs
09:02 🔗 Jonison has quit IRC (Read error: Connection reset by peer)
09:31 🔗 icedice has joined #archiveteam-bs
09:31 🔗 icedice has quit IRC (Remote host closed the connection)
09:31 🔗 etudier has joined #archiveteam-bs
09:39 🔗 Jonison has joined #archiveteam-bs
10:18 🔗 etudier has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…)
10:34 🔗 etudier has joined #archiveteam-bs
10:53 🔗 etudier has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…)
11:21 🔗 pizzaiolo has joined #archiveteam-bs
11:32 🔗 dashcloud has quit IRC (Read error: Operation timed out)
11:39 🔗 dashcloud has joined #archiveteam-bs
11:51 🔗 mls has quit IRC (Ping timeout: 250 seconds)
12:03 🔗 mls has joined #archiveteam-bs
12:38 🔗 BlueMaxim has quit IRC (Quit: Leaving)
12:46 🔗 etudier has joined #archiveteam-bs
12:52 🔗 plue has quit IRC (Quit: WeeChat 1.5)
13:01 🔗 Jonison has quit IRC (Ping timeout: 260 seconds)
13:01 🔗 etudier has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…)
13:04 🔗 etudier has joined #archiveteam-bs
13:09 🔗 mls has quit IRC (Ping timeout: 250 seconds)
13:29 🔗 etudier has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…)
13:30 🔗 dashcloud has quit IRC (Read error: Operation timed out)
13:40 🔗 SketchCow It is not gone if dark'd.
13:44 🔗 mls has joined #archiveteam-bs
13:45 🔗 etudier has joined #archiveteam-bs
13:50 🔗 dashcloud has joined #archiveteam-bs
13:59 🔗 etudier has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…)
14:10 🔗 etudier has joined #archiveteam-bs
14:17 🔗 dd0a13f37 has joined #archiveteam-bs
14:19 🔗 dd0a13f37 hook54321: The official site is libgen.io (or the IP, 94.something), gen.lib.rus.ec is an official mirror which only has the metadata
14:19 🔗 dd0a13f37 hook54321: The official site is libgen.io (or the IP, 94.something), gen.lib.rus.ec is an official mirror which only has the metadata
14:20 🔗 dd0a13f37 second: In technological trouble, yes, but the operators will be fine. They have good opsec, have been doing this for 20 years, and the only one who isn't anonymous is a literal fugitive. They all live in the former soviet union too, so copyright is not a big problem there.
14:20 🔗 dd0a13f37 libgen.pw, b-ok, and bookza are unofficial mirrors. sci-hub is a sister project run by the aforemented fugitive using libgen as a storage backend and likely have backups too
14:21 🔗 dd0a13f37 The torrents are also decently seeded from various residential russian IPs, and there are probably more who aren't seeding the torrents since it's storage-bound
14:22 🔗 dd0a13f37 astrid: from what I can see, you're lacking a copy of sci-hub (aka sci-mag), which is really much more important than libgen
14:25 🔗 JAA TIL someone thought it'd be a good idea to name a parasitic wasp after Elbakyan.
14:25 🔗 dd0a13f37 bit rude
14:26 🔗 JAA Yeah, that's what she said as well.
14:26 🔗 JAA But regarding the urgency of backing up Sci-Hub: I thought it's just a frontend to libgen? What additional data is there on SciHub?
14:27 🔗 dd0a13f37 well, I doubt they'll all be arrested at the same time since they're different projects
14:27 🔗 dd0a13f37 It's not that simple
14:27 🔗 dd0a13f37 sci-hub uses libgen as a backend
14:27 🔗 dd0a13f37 they have tons of "donated" accounts, and they cycle through them
14:27 🔗 dd0a13f37 and download articles
14:27 🔗 dd0a13f37 Scihub's articles are not in the main libgen collection
14:28 🔗 dd0a13f37 libgen is separated into sci-tech (libgen), comics, paintings, russian fiction ,foreign fiction, scimag
14:28 🔗 JAA Oh
14:28 🔗 dd0a13f37 only sci-tech (libgen) is backed up afaik
14:28 🔗 dd0a13f37 maybe foreignfiction/rus fict too
14:28 🔗 JAA Hm, I see.
14:29 🔗 dd0a13f37 look at the library genesis forum if you're curious about how it works
14:30 🔗 dd0a13f37 might be a good idea to use tor depending on where you live
14:33 🔗 dd0a13f37 and the libgen collection on IA is not complete from what I can see, https://archive.org/details/gen-lib&tab=about was last updated in 2016
14:33 🔗 drumstick has quit IRC (Read error: Operation timed out)
14:33 🔗 dd0a13f37 and the libgen collection on IA is not complete from what I can see, https://archive.org/details/gen-lib&tab=about was last updated in 2016
14:35 🔗 JAA Mar 2017 according to the graph, but keep in mind that this might not be the correct collection.
14:36 🔗 dd0a13f37 That's the only one with any amount of activity
14:36 🔗 dd0a13f37 Unless they store it under some other name
14:36 🔗 dd0a13f37 or don't make it public at all
14:37 🔗 Soni has quit IRC (Ping timeout: 250 seconds)
14:45 🔗 DFJustin that's not how you see if things have been uploaded to a collection
14:45 🔗 DFJustin there are items in that collection from 2 days ago
14:45 🔗 dd0a13f37 How do you?
14:46 🔗 DFJustin I don't know if there is a public way
14:46 🔗 dd0a13f37 Can you see what the name is? Is it something like 2092000?
14:46 🔗 DFJustin r_1727000
14:47 🔗 dd0a13f37 sci-hub can probably afford backups, they currently have 67 btc (USD $270k) in their bitcoin wallet, and their expenses are around "a few thousand" a month
14:47 🔗 dd0a13f37 that's an official torrent from 17-Aug-2017
14:47 🔗 dd0a13f37 http://libgen.io/libgen/repository_torrent/
14:47 🔗 SketchCow Hey, remmeber the good times when I'd be able to answer Internet Archive questions helpfully
14:47 🔗 SketchCow Before edsu implied that the Internet Archive banned him?
14:47 🔗 SketchCow Those were good times.
14:48 🔗 SketchCow How's that #internetarchive channel doing, anyway, now that I can't go in there
14:48 🔗 dd0a13f37 Would it be possible for you to add the later ones? It's as easy as downloading http://libgen.io/libgen/repository_torrent/r0-2092.ZIP and the last few ones, then deriving if I understand correctly
14:48 🔗 SketchCow Oh, and why does edsu have op on #archiveteam again
14:49 🔗 SketchCow When he wrote a whole essay with half-baked info about what the Internet Archive was going to do with robots.txt and got a wave of hatred?
14:49 🔗 SketchCow Not that I'm going to jeopardize my job and ban him, or anything
14:49 🔗 dd0a13f37 r_2093000-r_2105000 from the site I linked
14:49 🔗 dd0a13f37 Quick summary?
14:49 🔗 SketchCow Good times, good times
14:50 🔗 SketchCow Heyyyyyy the Ted Nelson scans are going beautifully, and the CD-ROM scanning has a faster workflow
14:51 🔗 SketchCow I paid $40 for a program that does nothing but crop
14:51 🔗 SketchCow But it crops well!
14:51 🔗 Frogging do one thing and do it well
14:51 🔗 Frogging :)
14:52 🔗 SketchCow This does the one thing very well.
14:53 🔗 SketchCow It's called "Batchcrop"
14:54 🔗 SketchCow I can say "OK, for the big pile of TIFFs I just scanned... crop away all the white part of the scan, with a X amount of pixels in all directions around the "content", and save it."
14:54 🔗 pizzaiolo has quit IRC (Quit: pizzaiolo)
14:54 🔗 SketchCow So basically, I can just keep shoving CDs into my scanner, one at a time, and just scan them each into a directory.
14:55 🔗 pizzaiolo has joined #archiveteam-bs
14:55 🔗 SketchCow The longest part now is typing in names for the scans so they either match CD-ROMs I put up on archive with no scan, or match to rips I just did of same.
14:55 🔗 dd0a13f37 Can't most image processing programs do that? Or does it have a sophisticated white space detection algo?
14:55 🔗 SketchCow I lent a guy some CDs to do this... 2 years ago
14:55 🔗 SketchCow He sheepishly brought the bin back to me last week.
14:56 🔗 SketchCow I scanned and cropped all 86 in about 1.5 hours
14:56 🔗 SketchCow I'm sure all image processing programs do something.
14:56 🔗 SketchCow They are many like it but this one is mine
14:59 🔗 dd0a13f37 For the really hostile sites, what about using commerccial proxy providers? I read about LJ on the wiki and they were apparently blacklisting your IPs
15:01 🔗 dd0a13f37 >For this project, set it to 1, beacuse LiveJournal tends to ban scrapers!
15:10 🔗 dd0a13f37 >Since 2015, Sci-Hub has operated its own repository , distinct from LibGen
15:11 🔗 dd0a13f37 If this is true (which I'm not sure of), then that might be why LG sci-mag torrents are unavailable
15:16 🔗 DFJustin <dd0a13f37> Would it be possible for you to add the later ones?
15:16 🔗 DFJustin obviously somebody is actively working on it so there's no point in recruiting somebody else
15:18 🔗 dd0a13f37 All you have to do is upload torrent and derive
15:18 🔗 DFJustin put that energy into archiving something that isn't famous
15:18 🔗 dd0a13f37 and it's not actively maintained as far as I udnerstand
15:18 🔗 dd0a13f37 I am, I'm waiting on some email responses currently
15:42 🔗 Mateon1 has quit IRC (Quit: Mateon1)
15:43 🔗 Mateon1 has joined #archiveteam-bs
16:21 🔗 schbirid http://www.bbc.com/news/uk-england-wiltshire-41267378
16:44 🔗 odemg is now known as xbinwank
16:46 🔗 Honno has quit IRC (Read error: Operation timed out)
16:47 🔗 xbinwank is now known as odemg
16:48 🔗 dd0a13f37 Anyone here speak/understand korean?
17:31 🔗 Asparagir has joined #archiveteam-bs
17:31 🔗 svchfoo3 sets mode: +o Asparagir
17:31 🔗 svchfoo1 sets mode: +o Asparagir
17:39 🔗 kristian_ has joined #archiveteam-bs
17:50 🔗 dashcloud has quit IRC (Read error: Operation timed out)
17:50 🔗 dashcloud has joined #archiveteam-bs
17:54 🔗 dd0a13f37 www.korean-books.com.kp/en/packages/xnps/download.pg.php?419 change "en" to "ko en fr sp de ru ch ja ar" to taste and 430 to any number <= 430
17:54 🔗 dd0a13f37 What's the proper way to archive something like this? Do you need WARC for what's just a GET request returning a file?
17:55 🔗 astrid the proper way is to gin up a list of urls and submit to archivebot with !ao < http://url/yourlist.txt
17:56 🔗 dd0a13f37 Thanks!
17:57 🔗 astrid then you can download the warc when the job is done and extract everything from it, if you're so inclined :)
17:58 🔗 dd0a13f37 Okay, so what's the proper way when there's also metadata in XML and thumbnails? Parse separately or make script to rename them to their "real" names?
17:59 🔗 astrid hm?
18:00 🔗 dd0a13f37 They're named like 00000412.pdf
18:00 🔗 dd0a13f37 But they have names
18:00 🔗 dd0a13f37 one second, site takes a bit to load
18:01 🔗 dd0a13f37 They have names like "UNDERSTANDING KOREA (9) (HUMAN RIGHTS)"
18:01 🔗 dd0a13f37 Also metadata
18:01 🔗 dd0a13f37 "- Book on Common Sense -"
18:01 🔗 dd0a13f37 "Foreign Languages Publishing House"
18:01 🔗 dd0a13f37 "87 pp"
18:02 🔗 dd0a13f37 and an image
18:02 🔗 dd0a13f37 This won't be saved if you just have them archive a link list
18:02 🔗 DFJustin if you're motivated / have skills, the best way would probably be to upload each pdf as a separate IA book item with metadata
18:02 🔗 astrid well i'd add the xml files to the link list then
18:02 🔗 fie has quit IRC (Read error: Operation timed out)
18:03 🔗 dd0a13f37 It's not XML, you issue a POST request and get an entire page as HTML
18:07 🔗 dd0a13f37 So you'd have to parse it
18:07 🔗 dd0a13f37 I have neither the skills and there are a few thousand
18:12 🔗 astrid ohh
18:12 🔗 dd0a13f37 https://pastebin.com/parEbjPK this is what it looks like
18:13 🔗 dd0a13f37 after parsing
18:13 🔗 dd0a13f37 you send a base64 encoded json dict
18:13 🔗 dd0a13f37 and get back a json dict
18:13 🔗 dd0a13f37 containing the page html
18:13 🔗 dd0a13f37 and it's escaped with backslashes two or three times
18:14 🔗 astrid that sounds like a delight
18:14 🔗 dd0a13f37 check out their homemade CMS
18:14 🔗 dd0a13f37 It's stateful, you set which language you want, it saves it server-side
18:15 🔗 ReimuHaku has quit IRC (Ping timeout: 250 seconds)
18:18 🔗 dd0a13f37 !ao < https://my.mixtape.moe/nsrkrj.txt
18:18 🔗 dd0a13f37 like this?
18:19 🔗 astrid you need http:// at the front of your urls
18:19 🔗 astrid or https://
18:19 🔗 astrid or ftp://
18:19 🔗 astrid depending
18:20 🔗 dd0a13f37 thanks
18:21 🔗 dd0a13f37 !ao < https://my.mixtape.moe/tktryb.txt
18:21 🔗 astrid these uh
18:21 🔗 astrid aren't exactly pdfs
18:22 🔗 dd0a13f37 They are
18:22 🔗 dd0a13f37 Or does not handle content disposition?
18:22 🔗 astrid they're pdfs with a sql statement at the front ???
18:22 🔗 dd0a13f37 I can open them just fine
18:22 🔗 astrid hm
18:23 🔗 astrid maybe pdf doesn't mind about that
18:23 🔗 ReimuHaku has joined #archiveteam-bs
18:24 🔗 astrid they seem to all start with
18:24 🔗 dd0a13f37 oh yeah I see
18:24 🔗 astrid Update PublicationList_ko Set pVisitCount="2" Where pId=127%PDF-1.4
18:24 🔗 astrid it's uh
18:24 🔗 dd0a13f37 sqli
18:24 🔗 astrid nice job folks
18:24 🔗 dd0a13f37 There are numerous other vulnerabilities too
18:25 🔗 astrid figures
18:25 🔗 dd0a13f37 There's an undocumented way to register an account on KCNA
18:25 🔗 astrid okay, well, go ahead and submit that job in #archivebot
18:25 🔗 dd0a13f37 which appears to do nothing
18:25 🔗 dd0a13f37 but it actually registers you
18:25 🔗 astrid lol
18:25 🔗 dd0a13f37 and you can log in
18:25 🔗 dd0a13f37 and the only thing it does
18:25 🔗 dd0a13f37 is add some tracking code
18:25 🔗 dd0a13f37 you don't even show up as logged in
18:25 🔗 dd0a13f37 there is also a random zip file serving malware
18:44 🔗 dd0a13f37 Well, I can't get it to work. Any pointers? It needs a timeout of maybe 5 minutes for the first request, then some IP whitelisting or something happens
18:44 🔗 dd0a13f37 So just forcing IA to do a request would be fine
18:44 🔗 astrid IA doesn't run archivebot :)
18:45 🔗 dd0a13f37 Does it use IA IPs?
18:45 🔗 astrid no
18:45 🔗 astrid we run archivebot
18:45 🔗 dd0a13f37 Does it share an IP with anything else?
18:45 🔗 astrid it's a bunch of machines, run by several people in this channel
18:45 🔗 astrid generally they have dedicated IPs, but multiple grabbers run per host
18:46 🔗 dd0a13f37 Do you run one of them? Can you force it to use a certain machine?
18:46 🔗 astrid yes and yes
18:46 🔗 dd0a13f37 Do you have SSH access/similar?
18:46 🔗 astrid i wasn't getting that whitelisting effect, btw
18:46 🔗 astrid it may be that you've got browser keepalives going on
18:47 🔗 dd0a13f37 Nope, they do connection:close
18:48 🔗 dd0a13f37 I might be mistaking it for something else, but wget takes a long time (minutes) if it even does it
18:48 🔗 dd0a13f37 and ff is instant
18:49 🔗 astrid archivebot is more similar to wget than to firefox
18:49 🔗 dd0a13f37 yeah
18:49 🔗 dd0a13f37 oh, apparently I have a phpsessid
18:49 🔗 dd0a13f37 well that explains it
18:50 🔗 dd0a13f37 "Apache/2.2.15 (RedStar 3.0)", how does it even work
18:50 🔗 dd0a13f37 Does it just randomly time out requests?
18:51 🔗 dd0a13f37 I managed to get one with wget now, connecting took 20 seconds and downloading 2m:20s (at 15 kbit)
19:01 🔗 zyphlar has joined #archiveteam-bs
19:09 🔗 dd0a13f37 Well, I can't wrap my head around north korean web magic
19:10 🔗 dd0a13f37 has left
19:48 🔗 superkuh https://www.eff.org/deeplinks/2017/09/open-letter-w3c-director-ceo-team-and-membership "Effective today, EFF is resigning from the W3C."
19:49 🔗 astrid o_O
19:50 🔗 JAA Wow
19:51 🔗 JAA Ah, the DRM bullshit, right.
20:11 🔗 schbirid has quit IRC (Quit: Leaving)
20:20 🔗 hook54321 holy crap
20:22 🔗 hook54321 I imagine this event will be a bit different now. https://twitter.com/internetarchive/status/909868291249684480
20:27 🔗 kim_ has joined #archiveteam-bs
20:48 🔗 Dark_Star has quit IRC (Remote host closed the connection)
21:11 🔗 zyphlar has quit IRC (Quit: Connection closed for inactivity)
21:29 🔗 Darkstar has joined #archiveteam-bs
21:43 🔗 noirscape has quit IRC (Read error: Operation timed out)
21:43 🔗 zino has quit IRC (Quit: Leaving)
21:46 🔗 hook54321 https://www.youtube.com/watch?v=h94ZKGVg-B8
21:46 🔗 hook54321 I think we should post something about this on the ArchiveTeam twitter account.
21:56 🔗 godane who wants to start building rpi librarybox boxies?
21:59 🔗 BlueMaxim has joined #archiveteam-bs
22:00 🔗 balrog has quit IRC (Read error: Operation timed out)
22:00 🔗 JAA has quit IRC (Read error: Operation timed out)
22:00 🔗 C4K3 has quit IRC (Read error: Operation timed out)
22:00 🔗 ruunyan has quit IRC (Read error: Operation timed out)
22:00 🔗 squires has quit IRC (Read error: Operation timed out)
22:00 🔗 ZexaronS has quit IRC (Read error: Operation timed out)
22:01 🔗 rocode has quit IRC (Read error: Operation timed out)
22:01 🔗 ZexaronS has joined #archiveteam-bs
22:02 🔗 JAA has joined #archiveteam-bs
22:02 🔗 swebb sets mode: +o JAA
22:02 🔗 wp494 has quit IRC (Read error: Operation timed out)
22:02 🔗 squires has joined #archiveteam-bs
22:02 🔗 balrog has joined #archiveteam-bs
22:02 🔗 swebb sets mode: +o balrog
22:03 🔗 REiN^ has quit IRC (Write error: Broken pipe)
22:03 🔗 wp494 has joined #archiveteam-bs
22:03 🔗 PotcFdk has quit IRC (Write error: Broken pipe)
22:04 🔗 ruunyan has joined #archiveteam-bs
22:05 🔗 REiN^ has joined #archiveteam-bs
22:05 🔗 tfgbd_znc has quit IRC (Ping timeout: 600 seconds)
22:06 🔗 tfgbd_znc has joined #archiveteam-bs
22:06 🔗 rocode has joined #archiveteam-bs
22:07 🔗 drumstick has joined #archiveteam-bs
22:07 🔗 C4K3 has joined #archiveteam-bs
22:11 🔗 PotcFdk has joined #archiveteam-bs
22:15 🔗 wabu has quit IRC (Ping timeout: 246 seconds)
22:20 🔗 ola_norsk has joined #archiveteam-bs
22:21 🔗 hook54321 godane: What are those?
22:21 🔗 ola_norsk is posting links possible?
22:21 🔗 astrid yes definitely
22:21 🔗 ola_norsk ok, one sec
22:22 🔗 ola_norsk https://pbs.twimg.com/media/DKCW8SnWkAIgpqn.jpg:large
22:22 🔗 ola_norsk that is the result of the attempt
22:22 🔗 ola_norsk but, let me get the url to the tweet status, so you dont need to retype it from image
22:23 🔗 ola_norsk https://twitter.com/JeffHollandaise/status/897970096429084672
22:24 🔗 astrid hm, for some reason twitter has decided that you're coming from germany
22:24 🔗 ola_norsk I can view this url..but can't archive it. When i try, i get german twitter
22:24 🔗 astrid i'm not sure how it decides that
22:24 🔗 astrid probably the source IP that archive.org is using looks like a german IP
22:25 🔗 ola_norsk yes, it's not me
22:25 🔗 astrid you are more than welcome to join #archivebot and do
22:25 🔗 astrid !ao https://twitter.com/JeffHollandaise/status/897970096429084672 --ignore-sets=twitter
22:25 🔗 astrid er, also the --phantomjs option
22:25 🔗 ola_norsk ill check it out. ty
22:26 🔗 wabu has joined #archiveteam-bs
22:26 🔗 ola_norsk but, i have to ask..what difference would it really make?
22:26 🔗 astrid archivebot is run by us, and i haven't seen any german redirects affecting it
22:26 🔗 astrid (archiveteam is not the same as archive.org, we have completely different infrastructure)
22:27 🔗 ola_norsk i mean looking like a german IP, would there be any difference in it working or not?
22:27 🔗 astrid oh
22:27 🔗 astrid uhhh, it shouldn't redirect
22:27 🔗 astrid what do you want to happen exactly?
22:27 🔗 astrid the --phantomjs option will pull in the css and images and javascript so it'll look and work correctly
22:28 🔗 astrid doing it with archivebot will make sure it gets run from a jurisdiction where twitter won't screen out nazi imagery
22:28 🔗 ola_norsk i would expect waybackmachine to archive like regular
22:28 🔗 ola_norsk ok
22:28 🔗 astrid wayback machine's liveweb feature usually works well but sometimes has some issues
22:28 🔗 astrid twitter is a difficult website to archive
22:29 🔗 ola_norsk i've had no problem so far i think
22:29 🔗 astrid hm okay
22:29 🔗 astrid maybe it's because the tweet has nazi imagery in it, i know they filter that sort of thing out in some places
22:30 🔗 atluxity has quit IRC (Ping timeout: 506 seconds)
22:30 🔗 ola_norsk so in german twitter, nazi imagiry (i havent looked close to see if there was any), is screened?
22:30 🔗 astrid sometimes?
22:30 🔗 astrid it's not clear
22:30 🔗 astrid i mean there is some nazi/kkk shit in that tweet
22:31 🔗 ola_norsk ok
22:32 🔗 ola_norsk anyway, thanks for the help. I can't stand nazism myself, but this was really frustrating
22:33 🔗 astrid i'm not a fan either ...
22:33 🔗 astrid yeah
22:35 🔗 astrid but yeah. #archivebot is a channel on this network where we operate an irc bot that lets you submit links for archival
22:38 🔗 ola_norsk btw, i also tried to previously to archive my own https://pbs.twimg.com/media/DKCYoW-W4AAsH_T.jpg:large
22:39 🔗 ola_norsk and i can't see how i'm pegged as a nazi
22:40 🔗 Lagittaja well, looks like my home "server" build completes faster than I expected. scored a nice (imho) motherboard from the same seller I got the i3-2120 from. intel's dq67ow
22:40 🔗 astrid ola_norsk: maybe hm maybe actually, that looks like archive.org's ip space has been blocked from using twitter without logging in
22:41 🔗 bluesoul has quit IRC (Read error: Operation timed out)
22:41 🔗 astrid :(
22:41 🔗 Lagittaja haven't had much experience with Intel's boards in the past other than the DH77EB in my mother's HTPC which actually has been rock solid for the past 4+ years. and this thing was 32�, including shipping. not too shabby
22:41 🔗 svchfoo1 has quit IRC (Remote host closed the connection)
22:41 🔗 bluesoul has joined #archiveteam-bs
22:41 🔗 astrid Lagittaja: i think that's completely offtopic for this channel
22:42 🔗 svchfoo1 has joined #archiveteam-bs
22:42 🔗 Lagittaja well sorry astrid, I have been having a conversation about this build with another person on this channel and I intend to use it to put more horse power for archiving. so sorry I'll see myself out
22:43 🔗 Lagittaja has quit IRC (Quit: Leaving)
22:43 🔗 svchfoo3 sets mode: +o svchfoo1
22:48 🔗 astrid ah, sorry, i didn't know
22:54 🔗 kristian_ has quit IRC (Quit: Leaving)
22:54 🔗 ola_norsk has left
23:17 🔗 BartoCH has quit IRC (Quit: WeeChat 1.9)
23:18 🔗 godane hook54321: i'm working on a project to add kiwix to slackwarearm 14.2
23:19 🔗 godane https://archive.org/details/slackwarearm-14.2-20170906-kiwix
23:22 🔗 drumstick has quit IRC (Read error: Operation timed out)
23:23 🔗 joepie91_ https://twitter.com/xor/status/909888462584795136
23:24 🔗 godane i now just need to write a script to mount /dev/sda2 and look for something like /mnt/data/kiwix for all the kiwix files
23:24 🔗 godane i have another script to build the library.xml file in /mnt/data/kiwix folder
23:25 🔗 godane then its kiwix --library $path/library.xml --port 8000 --daemon somthing
23:33 🔗 Soni has joined #archiveteam-bs
23:48 🔗 fie has joined #archiveteam-bs

irclogger-viewer