Time |
Nickname |
Message |
00:00
🔗
|
phuzion |
arkiver: any chance you can somehow give me preview access of some sort so I can test my ansible script? |
00:00
🔗
|
arkiver |
sure |
00:00
🔗
|
arkiver |
wait I'll get the latest scripts online |
00:00
🔗
|
phuzion |
yeah, let me know when you do that, I'll whip up an ansible script and test. |
00:06
🔗
|
HCross |
arkiver, I should be ordering my new dedi in the next few days so will hit it hard when I get it |
00:07
🔗
|
arkiver |
awesome |
00:07
🔗
|
HCross |
as soon as Kimsufi gets some stock |
00:08
🔗
|
phuzion |
Oh nice. I was debating getting a Kimsufi server too |
00:08
🔗
|
HCross |
I am going for the KS3. I had a server with them before. As long as you can "shut up and use it" then you are gfine |
00:08
🔗
|
arkiver |
Because of the size of some google code project (tens of thousands of bugs reports, thousand of revisions, etc.) some items might be a few hundredthousand URLs |
00:09
🔗
|
HCross |
Ill have 2tb space and a Core i5 |
00:10
🔗
|
HCross |
Ill also get a "Raspberry Pi: Datacenter" thingy setup |
00:10
🔗
|
arkiver |
"Raspberry Pi: Datacenter" :P |
00:10
🔗
|
HCross |
arkiver, will there be options to set the max file size for each worker? like we gad at SF |
00:10
🔗
|
HCross |
had |
00:10
🔗
|
aaaaaaaaa |
be sure to join #googlecodeblue if you haven't already |
00:11
🔗
|
|
nightpool has quit IRC (Read error: Connection reset by peer) |
00:11
🔗
|
arkiver |
HCross: we can't predict the size of an items, so I think that would be a no |
00:11
🔗
|
HCross |
ah ok |
00:11
🔗
|
HCross |
Ill just have to sort some disk space out |
00:11
🔗
|
arkiver |
Updated scripts are online |
00:12
🔗
|
arkiver |
debug lines are still in for now |
00:12
🔗
|
arkiver |
for example https://github.com/ArchiveTeam/googlecode-grab/blob/master/googlecode.lua#L195-L198 |
00:14
🔗
|
|
w0rp has joined #archiveteam |
00:15
🔗
|
phuzion |
arkiver: Nothing particularly different about these scripts from a requirements or setup perspective? |
00:16
🔗
|
aaaaaaaaa |
phuzion: the web grab no, but the repo grab has some requirements above the usual. |
00:17
🔗
|
HCross |
lets take this to the proper chan |
00:17
🔗
|
phuzion |
Ah right |
00:21
🔗
|
|
aaaaaaaa_ has joined #archiveteam |
00:21
🔗
|
|
swebb sets mode: +o aaaaaaaa_ |
00:27
🔗
|
|
aaaaaaaaa has quit IRC (Read error: Operation timed out) |
00:38
🔗
|
|
Guest1247 has joined #archiveteam |
00:39
🔗
|
|
aaaaaaaa_ is now known as aaaaaaaaa |
00:41
🔗
|
|
superkuh has joined #archiveteam |
00:58
🔗
|
|
bwn has quit IRC (Ping timeout: 252 seconds) |
00:58
🔗
|
jleclanch |
i was going through some old files and found this: https://www.reddit.com/r/Stargate/comments/125gdo/the_complete_sg1_atlantis_setshipprop_schematics/ |
00:59
🔗
|
jleclanch |
there's a torrent in the comments |
01:00
🔗
|
|
Microguru has joined #archiveteam |
01:04
🔗
|
superkuh |
jleclanch, I've been seeding a torrent of most of that stuff for a year or two. |
01:04
🔗
|
jleclanch |
superkuh: oh cool, magnet link? someone else was asking for it |
01:06
🔗
|
superkuh |
No magnet, but http://superkuh.com/stargate-misc.torrent |
01:07
🔗
|
superkuh |
Sinec 10/29/2012. |
01:07
🔗
|
jleclanch |
zenguy_pc: ^ |
01:07
🔗
|
|
philpem has quit IRC (Ping timeout: 252 seconds) |
01:08
🔗
|
superkuh |
Indeed. |
01:09
🔗
|
superkuh |
(1.4 GB) 200 KB/s limit on my end. |
01:18
🔗
|
|
Guest1247 has quit IRC (Ping timeout: 240 seconds) |
01:21
🔗
|
SketchCow |
https://archive.org/details/kpfa_podcasts is coming along. Good work, godane |
01:24
🔗
|
|
jonimus is now known as Jonimus |
01:24
🔗
|
|
bwn has joined #archiveteam |
01:30
🔗
|
godane |
cool |
01:35
🔗
|
godane |
SketchCow: your spelling ericarchive collection wrong: https://archive.org/details/ERIC_ED042000 |
01:36
🔗
|
godane |
your putting collection name as ericachive |
01:39
🔗
|
|
JesseW has joined #archiveteam |
01:46
🔗
|
|
Ghost_of_ has quit IRC (Quit: Leaving) |
01:54
🔗
|
|
Ungstein has joined #archiveteam |
02:02
🔗
|
|
primus104 has quit IRC (Leaving.) |
02:13
🔗
|
SketchCow |
I KNOW |
02:13
🔗
|
SketchCow |
I KNOWWWWWW |
02:13
🔗
|
SketchCow |
I just had to fix that |
02:14
🔗
|
godane |
ok |
02:26
🔗
|
|
nick_name has joined #archiveteam |
02:26
🔗
|
nick_name |
hello |
02:27
🔗
|
MrRadar |
Hello |
02:27
🔗
|
nick_name |
Does anyone know the progress so far on archiving DocsStoc |
02:27
🔗
|
nick_name |
? |
02:28
🔗
|
MrRadar |
Hmm... I'm not familiar with that project. ping arkiver ^ |
02:28
🔗
|
JesseW |
nick_name: very limited. Needing an account, and difficulties discovering URLs, has made the going slow and limited, as I understand it. |
02:29
🔗
|
nick_name |
I'm only concerned about DocsStoc because I was linked there from a Steam game guide. |
02:31
🔗
|
JesseW |
If you have *specific* URLs, the best thing to do is download them yourself, and upload copies to the internet archive. But it's also good to just paste the URLs here in this channel, and we'll see what we can do about scraping them. |
02:31
🔗
|
nick_name |
well time to scrape google |
02:32
🔗
|
nick_name |
MANUALLY |
02:35
🔗
|
JesseW |
ping achip about scraping google. He's got some useful scripts in that direction. |
02:36
🔗
|
JesseW |
I think we may have already done that (but doing it again is good, too) |
02:38
🔗
|
nick_name |
What is this ping you speak of? |
02:38
🔗
|
JesseW |
we've already done it -- it just refers to mentioning their name in the channel, so they will see this conversation and (eventually) respond, hopefully |
02:39
🔗
|
nick_name |
MANUAL search.disconnect.me scrape; up to page 3: |
02:40
🔗
|
nick_name |
http://pastebin.com/raw.php?i=LHrw5LNb |
02:40
🔗
|
nick_name |
only 2 actual documents were found |
02:40
🔗
|
nick_name |
www.docstoc.com/docs/173891518/ |
02:40
🔗
|
nick_name |
www.docstoc.com/docs/173891525/ |
02:42
🔗
|
nick_name |
anyone have an account for docstocs that they are willing to share |
02:42
🔗
|
nick_name |
they are no longer accepting registration |
02:44
🔗
|
|
JesseW has quit IRC (Leaving.) |
02:45
🔗
|
nick_name |
oh well |
02:45
🔗
|
nick_name |
ping everyone |
02:45
🔗
|
nick_name |
get archiving! |
02:45
🔗
|
|
nick_name has quit IRC (Quit: Page closed) |
03:06
🔗
|
|
xk_id has quit IRC (Remote host closed the connection) |
03:12
🔗
|
|
xk_id has joined #archiveteam |
03:14
🔗
|
|
remsen has quit IRC (Read error: Operation timed out) |
03:16
🔗
|
|
xk_id has quit IRC (Read error: Operation timed out) |
03:28
🔗
|
|
JesseW has joined #archiveteam |
03:31
🔗
|
|
bwn has quit IRC (Ping timeout: 606 seconds) |
03:39
🔗
|
|
remsen has joined #archiveteam |
03:39
🔗
|
|
Sk1d has quit IRC (Ping timeout: 252 seconds) |
03:41
🔗
|
|
Ungstein has quit IRC (Quit: Leaving.) |
04:13
🔗
|
|
xk_id has joined #archiveteam |
04:30
🔗
|
|
kyan_ has joined #archiveteam |
04:33
🔗
|
|
xk_id has quit IRC (Ping timeout: 615 seconds) |
04:35
🔗
|
|
kyan has quit IRC (Read error: Operation timed out) |
04:52
🔗
|
|
aaaaaaaaa has quit IRC (Leaving) |
04:55
🔗
|
|
xk_id has joined #archiveteam |
05:06
🔗
|
|
GLaDOS has quit IRC (Ping timeout: 252 seconds) |
05:06
🔗
|
|
GLaDOS has joined #archiveteam |
05:14
🔗
|
|
Apathy has quit IRC (Read error: Operation timed out) |
05:14
🔗
|
|
aliz has quit IRC (Read error: Operation timed out) |
05:14
🔗
|
|
afics has quit IRC (Read error: Operation timed out) |
05:14
🔗
|
|
limebyte has quit IRC (hub.dk irc.homelien.no) |
05:14
🔗
|
|
Ymgve has quit IRC (hub.dk irc.homelien.no) |
05:14
🔗
|
|
altlabel has quit IRC (hub.dk irc.homelien.no) |
05:14
🔗
|
|
PurpleSym has quit IRC (hub.dk irc.homelien.no) |
05:14
🔗
|
|
i0npulse has quit IRC (hub.dk irc.homelien.no) |
05:14
🔗
|
|
mafrasi2 has quit IRC (hub.dk irc.homelien.no) |
05:14
🔗
|
|
yipdw has quit IRC (hub.dk irc.homelien.no) |
05:14
🔗
|
|
chfoo- has quit IRC (hub.dk irc.homelien.no) |
05:14
🔗
|
|
Jogie has quit IRC (hub.dk irc.homelien.no) |
05:14
🔗
|
|
PotcFdk has quit IRC (hub.dk irc.homelien.no) |
05:14
🔗
|
|
Marc has quit IRC (hub.dk irc.homelien.no) |
05:14
🔗
|
|
sHATNER has quit IRC (hub.dk irc.homelien.no) |
05:15
🔗
|
|
xmc has quit IRC (Read error: Operation timed out) |
05:15
🔗
|
|
afics has joined #archiveteam |
05:16
🔗
|
|
xmc has joined #archiveteam |
05:16
🔗
|
|
swebb sets mode: +o xmc |
05:17
🔗
|
|
aliz has joined #archiveteam |
05:17
🔗
|
|
Apathy has joined #archiveteam |
05:20
🔗
|
|
xk_id has quit IRC (Read error: Operation timed out) |
05:29
🔗
|
|
GLaDOS has quit IRC (Read error: Operation timed out) |
05:29
🔗
|
|
Guest1247 has joined #archiveteam |
05:29
🔗
|
|
GLaDOS has joined #archiveteam |
05:35
🔗
|
|
GLaDOS has quit IRC (Ping timeout: 252 seconds) |
05:35
🔗
|
Guest1247 |
Can I get voice'd again on #archivebot, please? |
05:42
🔗
|
|
GLaDOS has joined #archiveteam |
05:48
🔗
|
|
GLaDOS has quit IRC (Read error: Operation timed out) |
05:53
🔗
|
|
GLaDOS has joined #archiveteam |
05:55
🔗
|
|
Coderjoe has quit IRC (Read error: Connection reset by peer) |
06:00
🔗
|
|
Coderjoe has joined #archiveteam |
06:03
🔗
|
|
WinterFox has joined #archiveteam |
06:04
🔗
|
|
Microguru has quit IRC (Quit: Microguru) |
06:05
🔗
|
|
remsen has quit IRC (Read error: Connection reset by peer) |
06:10
🔗
|
|
PotcFdk has joined #archiveteam |
06:10
🔗
|
|
limebyte has joined #archiveteam |
06:10
🔗
|
|
Ymgve has joined #archiveteam |
06:10
🔗
|
|
altlabel has joined #archiveteam |
06:10
🔗
|
|
PurpleSym has joined #archiveteam |
06:10
🔗
|
|
i0npulse has joined #archiveteam |
06:10
🔗
|
|
mafrasi2 has joined #archiveteam |
06:10
🔗
|
|
yipdw has joined #archiveteam |
06:10
🔗
|
|
chfoo- has joined #archiveteam |
06:10
🔗
|
|
Jogie has joined #archiveteam |
06:10
🔗
|
|
Marc has joined #archiveteam |
06:10
🔗
|
|
sHATNER has joined #archiveteam |
06:10
🔗
|
|
irc.homelien.no sets mode: +ooo altlabel yipdw chfoo- |
06:11
🔗
|
|
Guest1247 has quit IRC (Ping timeout: 240 seconds) |
06:12
🔗
|
|
xk_id has joined #archiveteam |
06:14
🔗
|
|
W1nterFox has joined #archiveteam |
06:14
🔗
|
|
WinterFox has quit IRC (Read error: Operation timed out) |
06:20
🔗
|
|
xk_id has quit IRC (Read error: Operation timed out) |
07:03
🔗
|
|
xk_id has joined #archiveteam |
07:22
🔗
|
|
nightpool has joined #archiveteam |
07:23
🔗
|
|
xk_id has quit IRC (Remote host closed the connection) |
07:37
🔗
|
|
bwn has joined #archiveteam |
07:39
🔗
|
|
GLaDOS has quit IRC (Read error: Operation timed out) |
07:39
🔗
|
|
ironman_ has joined #archiveteam |
07:50
🔗
|
|
primus104 has joined #archiveteam |
07:52
🔗
|
|
nightpool has quit IRC (Read error: Operation timed out) |
07:53
🔗
|
|
GLaDOS has joined #archiveteam |
07:54
🔗
|
|
philpem has joined #archiveteam |
07:57
🔗
|
|
xk_id has joined #archiveteam |
07:58
🔗
|
|
xk_id has quit IRC (Remote host closed the connection) |
08:18
🔗
|
|
z00nx has quit IRC (Quit: WeeChat 1.3) |
08:18
🔗
|
|
z00nx has joined #archiveteam |
08:19
🔗
|
|
xhades has joined #archiveteam |
08:19
🔗
|
|
z00nx has quit IRC (Client Quit) |
08:20
🔗
|
|
remsen has joined #archiveteam |
08:20
🔗
|
|
z00nx has joined #archiveteam |
08:21
🔗
|
|
JesseW has quit IRC (Leaving.) |
08:31
🔗
|
|
xk_id has joined #archiveteam |
08:38
🔗
|
|
atomotic has joined #archiveteam |
08:40
🔗
|
|
primus104 has quit IRC (Leaving.) |
08:40
🔗
|
|
nightpool has joined #archiveteam |
08:47
🔗
|
|
nightpool has quit IRC (Ping timeout: 310 seconds) |
08:48
🔗
|
|
Sk1d has joined #archiveteam |
08:56
🔗
|
|
z00nx has quit IRC (Quit: WeeChat 1.3) |
08:56
🔗
|
|
z00nx has joined #archiveteam |
09:00
🔗
|
|
kyan_ has quit IRC (Leaving) |
09:03
🔗
|
|
PotcFdk has quit IRC (Ping timeout: 506 seconds) |
09:08
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
09:09
🔗
|
|
PotcFdk has joined #archiveteam |
09:09
🔗
|
|
dtm has quit IRC (Read error: Operation timed out) |
09:10
🔗
|
|
dtm has joined #archiveteam |
09:10
🔗
|
|
GLaDOS has quit IRC (Ping timeout: 252 seconds) |
09:12
🔗
|
|
GLaDOS has joined #archiveteam |
09:17
🔗
|
|
Ghost_of_ has joined #archiveteam |
09:30
🔗
|
|
PotcFdk has quit IRC (Ping timeout: 506 seconds) |
09:35
🔗
|
|
nightpool has joined #archiveteam |
09:39
🔗
|
|
bwn has joined #archiveteam |
09:39
🔗
|
|
nightpool has quit IRC (Read error: Operation timed out) |
09:56
🔗
|
|
PotcFdk has joined #archiveteam |
10:00
🔗
|
|
primus104 has joined #archiveteam |
10:01
🔗
|
|
xmc has quit IRC (Read error: Operation timed out) |
10:02
🔗
|
|
remsen has quit IRC (Read error: Operation timed out) |
10:03
🔗
|
|
schbirid has joined #archiveteam |
10:04
🔗
|
|
xmc has joined #archiveteam |
10:04
🔗
|
|
swebb sets mode: +o xmc |
10:05
🔗
|
|
PotcFdk has quit IRC (Ping timeout: 506 seconds) |
10:05
🔗
|
|
BlueMaxim has quit IRC (Quit: Leaving) |
10:11
🔗
|
|
PotcFdk has joined #archiveteam |
10:38
🔗
|
|
Ungstein has joined #archiveteam |
11:11
🔗
|
|
HCross has quit IRC (Read error: Connection reset by peer) |
11:13
🔗
|
|
SilSte has joined #archiveteam |
11:19
🔗
|
|
HCross has joined #archiveteam |
11:23
🔗
|
|
nightpool has joined #archiveteam |
11:27
🔗
|
|
nightpool has quit IRC (Ping timeout: 252 seconds) |
11:42
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
12:13
🔗
|
|
W1nterFox has quit IRC (Remote host closed the connection) |
12:17
🔗
|
|
nightpool has joined #archiveteam |
12:22
🔗
|
|
xk_id has quit IRC (Remote host closed the connection) |
12:22
🔗
|
|
nightpool has quit IRC (Ping timeout: 255 seconds) |
12:31
🔗
|
|
remsen has joined #archiveteam |
12:32
🔗
|
arkiver |
SketchCow: screenr is finished |
12:43
🔗
|
|
xk_id has joined #archiveteam |
12:49
🔗
|
|
xk_id has quit IRC (Read error: Operation timed out) |
12:53
🔗
|
|
icedice has joined #archiveteam |
13:01
🔗
|
|
atomotic has joined #archiveteam |
13:11
🔗
|
|
nightpool has joined #archiveteam |
13:18
🔗
|
|
nightpool has quit IRC (Read error: Operation timed out) |
13:31
🔗
|
|
remsen2 has joined #archiveteam |
13:33
🔗
|
|
primus104 has quit IRC (Leaving.) |
13:34
🔗
|
|
remsen has quit IRC (Read error: Operation timed out) |
13:41
🔗
|
|
remsen has joined #archiveteam |
13:43
🔗
|
|
remsen2 has quit IRC (Read error: Operation timed out) |
13:46
🔗
|
|
xk_id has joined #archiveteam |
14:05
🔗
|
|
nightpool has joined #archiveteam |
14:07
🔗
|
|
primus104 has joined #archiveteam |
14:08
🔗
|
|
Ghost_of_ has quit IRC (Quit: Leaving) |
14:14
🔗
|
|
nightpool has quit IRC (Read error: Operation timed out) |
14:19
🔗
|
|
xk_id has quit IRC (Read error: Connection reset by peer) |
14:20
🔗
|
|
xk_id has joined #archiveteam |
14:21
🔗
|
|
vitzli has joined #archiveteam |
14:33
🔗
|
|
bitsgalor has joined #archiveteam |
14:50
🔗
|
arkiver |
chfoo: can you please create a rsync target on FOS for docstoc? |
14:50
🔗
|
Atluxity |
is my hose needed anywhere? had some downtime these past days |
14:51
🔗
|
arkiver |
Atluxity: Google Code is probably going to start today |
14:52
🔗
|
Atluxity |
weeeee |
14:55
🔗
|
|
xk_id has quit IRC (Ping timeout: 615 seconds) |
14:57
🔗
|
bitsgalor |
WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD |
14:58
🔗
|
Atluxity |
yahoosucks |
14:58
🔗
|
bitsgalor |
Brilliant, thx! |
14:58
🔗
|
|
primus104 has quit IRC (Leaving.) |
14:59
🔗
|
|
xk_id has joined #archiveteam |
15:07
🔗
|
|
cvb has joined #archiveteam |
15:20
🔗
|
icedice |
I think we should at some point archive all of http://invisionfree.com/ |
15:22
🔗
|
icedice |
The company that offers the free forum hosting service, zIFBoards, hasn't updated their copyright date since 2014 |
15:23
🔗
|
icedice |
and pretty much nobody uses Invisionfree nowadays as far as I know since the competing free forum hosts beats it in the looks department. |
15:25
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
15:26
🔗
|
icedice |
I think between like 2004-2009 were Invisionfree's peak years, but that is my interpretation of the Invisionfree forums I've seen. |
15:28
🔗
|
icedice |
And Invisionfree's TOS hasn't been updated since 2010. |
15:36
🔗
|
Atluxity |
first step is to write some script to find everything |
15:36
🔗
|
|
xk_id has quit IRC (Read error: Connection reset by peer) |
15:37
🔗
|
|
xk_id has joined #archiveteam |
15:40
🔗
|
icedice |
I can add it to the wiki once I have enough free time. |
15:40
🔗
|
icedice |
I can't do much on the scripting department though. |
15:44
🔗
|
|
scyther has joined #archiveteam |
15:53
🔗
|
arkiver |
docstoc is also ready. |
15:53
🔗
|
Atluxity |
thats ok |
15:54
🔗
|
Atluxity |
icedice: just doing some "regular" research and writing what you know on the wiki about how they operate is good |
15:55
🔗
|
icedice |
It might not be some award worthy detailed masterpiece, but I think I can at least write a small section of basic info and arguments for archivation. |
15:56
🔗
|
arkiver |
Turns out docstoc has swf players for the documents which require some data files |
15:56
🔗
|
arkiver |
We're going to grab the swf file and data files for them too |
15:56
🔗
|
arkiver |
For now though the Wayback Machine can't yet reqrite the requests from swf files to wayback machine urls, so they won't work |
15:57
🔗
|
arkiver |
in the future they might though |
15:59
🔗
|
Atluxity |
for the future :D |
15:59
🔗
|
arkiver |
docstoc will be done in packs of 100 documents |
15:59
🔗
|
arkiver |
We can't download the full documents unfortunately |
15:59
🔗
|
arkiver |
so around 1.7 million items |
15:59
🔗
|
Atluxity |
we cant? |
16:00
🔗
|
arkiver |
we need a lot of account for that |
16:00
🔗
|
arkiver |
some documents though, have everything online, like http://www.docstoc.com/docs/173977322/SenseStone-Technology-and-Zhejiang-Jiakang-Electronics-Take-License-to-Xerafy's-Patent |
16:01
🔗
|
arkiver |
And for example http://www.docstoc.com/docs/84339417/Pre-Cana-Certificate---Excel |
16:01
🔗
|
arkiver |
which has swf file http://www.docstoc.com/docs/84339417/Pre-Cana-Certificate---Excel |
16:01
🔗
|
arkiver |
I mean swf file http://swf.docstoc.com/swf/loader.2.4.53.swf?doc_id=84339417&mem_id=16253712&revision=0&showrelated=0&showotherdocs=0&ref=http://www.docstoc.com/docs/84339417&allowdownload=1 |
16:02
🔗
|
arkiver |
which loads data like |
16:02
🔗
|
arkiver |
http://viewerdata.docstoc.com/getDocumentInfo.ashx?doc_id=84339417&host_url=http%3A//swf.docstoc.com/swf/loader.2.4.53.swf%3Fdoc_id%3D84339417%26mem_id%3D16253712%26revision%3D0%26showrelated%3D0%26showotherdocs%3D0%26ref%3Dhttp%3A//www.docstoc.com/docs/84339417%26allowdownload%3D1&mem_id=16253712 |
16:02
🔗
|
arkiver |
and http://docs.docstoc.com/did/16253712/84339417.did?rev=0 |
16:12
🔗
|
|
superkuh has quit IRC (Read error: Operation timed out) |
16:22
🔗
|
|
jleclanch has quit IRC (Read error: Operation timed out) |
16:22
🔗
|
|
jleclanch has joined #archiveteam |
16:30
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
16:31
🔗
|
|
luckcolor has joined #archiveteam |
16:34
🔗
|
chfoo |
arkiver: ok, done |
16:35
🔗
|
arkiver |
thanks!! |
16:38
🔗
|
|
nightpool has joined #archiveteam |
16:44
🔗
|
|
nightpool has quit IRC (Read error: Operation timed out) |
16:45
🔗
|
|
luckcolor has quit IRC (Quit: Page closed) |
16:50
🔗
|
icedice |
I'm guessing Archiveteam's policy on forums that you must register at to view is to not archive them, right? |
16:50
🔗
|
icedice |
There's a small Proboards forum that died down in like 2010 or something like that that I'd like to archive. |
16:52
🔗
|
icedice |
No sensitive information or anything, I'm guessing the admin just decided to put it behind a login page before they abandoned it. |
16:53
🔗
|
DFJustin |
icedice: we don't have any such policy |
16:53
🔗
|
icedice |
Ah, I just assumed that it kind of was in line with respecting robots.txt |
16:53
🔗
|
DFJustin |
we don't do that either |
16:53
🔗
|
icedice |
Ah, I like that. |
16:55
🔗
|
|
remsen has quit IRC (Read error: Operation timed out) |
16:57
🔗
|
icedice |
How hard would it be to archive a password protected forum, if I made an account there? |
16:57
🔗
|
icedice |
http://pokescansclub.proboards.com/ |
16:57
🔗
|
Atluxity |
policy.... hehehe |
16:57
🔗
|
Atluxity |
we do stuff |
16:57
🔗
|
Atluxity |
we get stuff |
16:57
🔗
|
icedice |
It could probably be done with HTTrack at least |
16:58
🔗
|
phuzion |
icedice: We have tools much more powerful than HTTrack. We can throw cookies into our scripts pretty easily. |
16:58
🔗
|
icedice |
Ok |
16:58
🔗
|
icedice |
The archived pages will show account details and such right? |
16:58
🔗
|
icedice |
At the very least the username |
17:02
🔗
|
DFJustin |
most likely |
17:03
🔗
|
|
luckcolor has joined #archiveteam |
17:03
🔗
|
luckcolor |
Hey guys |
17:04
🔗
|
luckcolor |
I'm searching an irc client to use so i can be more connected here |
17:04
🔗
|
luckcolor |
Does anybody want to tell me what you use? |
17:04
🔗
|
phuzion |
luckcolor: what's your OS? |
17:04
🔗
|
luckcolor |
windows |
17:04
🔗
|
|
JesseW has joined #archiveteam |
17:04
🔗
|
luckcolor |
but i could fire up linux if required |
17:05
🔗
|
phuzion |
https://www.reddit.com/r/windows/comments/1eezw4/good_irc_clients/ Enjoy. |
17:06
🔗
|
phuzion |
luckcolor: If you wanna keep discussing irc clients, let's do so in #archiveteam-bs |
17:06
🔗
|
luckcolor |
sure thanks |
17:14
🔗
|
|
vitzli has quit IRC (Leaving) |
17:18
🔗
|
|
Elegance has quit IRC (Read error: Connection reset by peer) |
17:18
🔗
|
|
Elegance has joined #archiveteam |
17:24
🔗
|
|
JesseW has quit IRC (Leaving.) |
17:27
🔗
|
arkiver |
I updated the docstoc scripts. |
17:28
🔗
|
arkiver |
If document are publicly available we're getting the original files now |
17:31
🔗
|
|
nightpool has joined #archiveteam |
17:36
🔗
|
|
nightpool has quit IRC (Ping timeout: 252 seconds) |
17:39
🔗
|
|
xk_id has quit IRC (Remote host closed the connection) |
17:43
🔗
|
|
Start has joined #archiveteam |
17:49
🔗
|
arkiver |
Untill google code start we can use some firepower over at docstoc! ;) |
17:49
🔗
|
|
bitsgalor has quit IRC (Leaving) |
17:50
🔗
|
|
nightpool has joined #archiveteam |
17:51
🔗
|
HCross |
Shall I leave mine at yuku? |
17:53
🔗
|
arkiver |
yuku doesn't have a deadline, docstoc does |
17:53
🔗
|
arkiver |
so for now maybe move them over to docstoc |
17:54
🔗
|
arkiver |
but I really am fine with both |
17:55
🔗
|
|
nightpool has quit IRC (Ping timeout: 258 seconds) |
17:58
🔗
|
arkiver |
So if we want to save everything from docstoc we should run at around 200 items/min |
18:02
🔗
|
|
luckcolor has quit IRC (Quit: Page closed) |
18:08
🔗
|
DFJustin |
docstoc doesn't start in the warrior for me http://interbutt.com/temp/docstoc.png |
18:08
🔗
|
DFJustin |
yuku works |
18:08
🔗
|
arkiver |
DFJustin: looks like I forget to make some files executable |
18:08
🔗
|
arkiver |
sorry! |
18:09
🔗
|
|
luckcolor has joined #archiveteam |
18:10
🔗
|
arkiver |
DFJustin: fixed! |
18:10
🔗
|
|
primus104 has joined #archiveteam |
18:11
🔗
|
DFJustin |
looks good |
18:12
🔗
|
|
jleclanch has quit IRC (Read error: Operation timed out) |
18:12
🔗
|
DFJustin |
yuku scripts seem to be getting an awful lot of login, reply, quote pages |
18:13
🔗
|
|
jleclanch has joined #archiveteam |
18:18
🔗
|
|
jleclanch has quit IRC (Ping timeout: 252 seconds) |
18:20
🔗
|
|
jleclanch has joined #archiveteam |
18:24
🔗
|
SimpBrain |
a lot of yuku space is wasted with the login redirects |
18:29
🔗
|
|
jleclanch has quit IRC (Ping timeout: 252 seconds) |
18:31
🔗
|
|
SN4T14 has joined #archiveteam |
18:35
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
18:39
🔗
|
|
jleclanch has joined #archiveteam |
18:51
🔗
|
|
jleclanch has quit IRC (Ping timeout: 255 seconds) |
18:53
🔗
|
|
scyther has quit IRC (Quit: Leaving) |
18:55
🔗
|
|
jleclanch has joined #archiveteam |
19:03
🔗
|
|
primus104 has quit IRC (Leaving.) |
19:04
🔗
|
|
nightpool has joined #archiveteam |
19:11
🔗
|
|
aaaaaaaaa has joined #archiveteam |
19:11
🔗
|
|
swebb sets mode: +o aaaaaaaaa |
19:18
🔗
|
|
bwn has quit IRC (Read error: Operation timed out) |
19:40
🔗
|
|
bwn has joined #archiveteam |
19:53
🔗
|
|
primus104 has joined #archiveteam |
19:57
🔗
|
|
nightpool has quit IRC (Ping timeout: 252 seconds) |
19:58
🔗
|
|
Stiletto has quit IRC (Ping timeout: 255 seconds) |
20:02
🔗
|
|
luckcolor has quit IRC (Quit: Leaving) |
20:08
🔗
|
|
nystrom has joined #archiveteam |
20:12
🔗
|
|
atomotic has joined #archiveteam |
20:22
🔗
|
|
Stiletto has joined #archiveteam |
21:02
🔗
|
|
Start has joined #archiveteam |
21:19
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
21:37
🔗
|
|
aaaaaaaaa has quit IRC (Ping timeout: 615 seconds) |
21:37
🔗
|
|
nightpool has joined #archiveteam |
21:40
🔗
|
Atluxity |
lets get this party started |
21:42
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
21:43
🔗
|
|
superkuh has joined #archiveteam |
21:46
🔗
|
|
bwn has quit IRC (Quit: Leaving) |
21:48
🔗
|
|
aaaaaaaaa has joined #archiveteam |
21:48
🔗
|
|
swebb sets mode: +o aaaaaaaaa |
21:49
🔗
|
|
yipdw has quit IRC (Read error: Connection reset by peer) |
21:50
🔗
|
|
yipdw has joined #archiveteam |
22:02
🔗
|
|
Ungstein has quit IRC (Quit: Leaving.) |
22:05
🔗
|
|
aaaaaaaaa has quit IRC (Read error: Connection reset by peer) |
22:05
🔗
|
|
aaaaaaaaa has joined #archiveteam |
22:05
🔗
|
|
swebb sets mode: +o aaaaaaaaa |
22:10
🔗
|
|
remsen has joined #archiveteam |
22:16
🔗
|
Atluxity |
how many TB does one recommend for being an rsync target? |
22:17
🔗
|
|
Start has quit IRC (Quit: Disconnected.) |
22:23
🔗
|
|
aaaaaaaaa has quit IRC (Read error: Operation timed out) |
22:29
🔗
|
|
godane has quit IRC (Read error: Operation timed out) |
22:40
🔗
|
|
Kenshin has quit IRC (Ping timeout: 252 seconds) |
22:40
🔗
|
|
Kenshin has joined #archiveteam |
22:52
🔗
|
|
bwn has joined #archiveteam |
23:12
🔗
|
|
BlueMaxim has joined #archiveteam |
23:14
🔗
|
arkiver |
Atluxity: depends on the project |
23:19
🔗
|
|
xk_id has joined #archiveteam |
23:21
🔗
|
arkiver |
I'm going to release an update for docstoc |
23:21
🔗
|
arkiver |
It'll ignore the redirects to the /most-recent page. Should make the items finish faster |
23:24
🔗
|
arkiver |
scripts for docstoc are updated! |
23:27
🔗
|
arkiver |
Atluxity: would you like to be a target? |
23:27
🔗
|
arkiver |
SketchCow: how much space does FOS have at the moment? |
23:33
🔗
|
|
nightpool has quit IRC (Ping timeout: 252 seconds) |
23:33
🔗
|
|
nightpool has joined #archiveteam |
23:34
🔗
|
|
aaaaaaaaa has joined #archiveteam |
23:34
🔗
|
|
swebb sets mode: +o aaaaaaaaa |
23:35
🔗
|
|
arkiver sets mode: +o chfoo |
23:37
🔗
|
|
antomati_ is now known as antomatic |
23:40
🔗
|
jleclanch |
there sure is a lot of spam on archive.org |
23:42
🔗
|
aaaaaaaaa |
There is a lot of spam anywhere that allows random people to upload/post content |
23:42
🔗
|
jleclanch |
kinda my point |
23:48
🔗
|
|
Start has joined #archiveteam |
23:57
🔗
|
arkiver |
So ArchiveBot is giving max connections reached |
23:57
🔗
|
|
cvb has quit IRC (Read error: Operation timed out) |
23:57
🔗
|
arkiver |
afaik achivebot also goes to FOS? |