Time |
Nickname |
Message |
00:09
🔗
|
|
BlueMaxim has joined #archiveteam |
01:38
🔗
|
|
j08nY has quit IRC (Remote host closed the connection) |
01:53
🔗
|
|
nertzy has quit IRC (This computer has gone to sleep) |
02:12
🔗
|
|
ZexaronS has quit IRC (Leaving) |
02:20
🔗
|
|
fie has joined #archiveteam |
03:00
🔗
|
|
Aranje has quit IRC (Three sheets to the wind) |
03:17
🔗
|
|
ndiddy has quit IRC () |
04:58
🔗
|
|
Sk1d has quit IRC (Ping timeout: 250 seconds) |
05:05
🔗
|
|
Sk1d has joined #archiveteam |
05:17
🔗
|
|
zyphlar has joined #archiveteam |
05:35
🔗
|
|
zenguy has quit IRC (Read error: Operation timed out) |
05:42
🔗
|
|
zenguy has joined #archiveteam |
05:45
🔗
|
|
zenguy has quit IRC (Read error: Operation timed out) |
05:49
🔗
|
|
zenguy has joined #archiveteam |
07:08
🔗
|
|
schbirid has joined #archiveteam |
08:01
🔗
|
|
Jonison has joined #archiveteam |
08:02
🔗
|
|
atomotic has joined #archiveteam |
08:47
🔗
|
|
zyphlar has quit IRC (Quit: Connection closed for inactivity) |
09:03
🔗
|
|
j08nY has joined #archiveteam |
10:21
🔗
|
|
Guest has joined #archiveteam |
10:25
🔗
|
|
anhedonis has joined #archiveteam |
11:03
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
11:25
🔗
|
|
ZexaronS has joined #archiveteam |
11:47
🔗
|
|
Smiley has joined #archiveteam |
11:47
🔗
|
|
SmileyG has quit IRC (Read error: Connection reset by peer) |
11:48
🔗
|
|
BlueMaxim has quit IRC (Read error: Operation timed out) |
12:26
🔗
|
|
j08nY has quit IRC (Quit: Leaving) |
12:43
🔗
|
|
atomotic has joined #archiveteam |
13:32
🔗
|
|
icedice has joined #archiveteam |
13:35
🔗
|
|
icedice2 has joined #archiveteam |
13:36
🔗
|
|
anhedonis has quit IRC (Quit: anhedonis) |
13:36
🔗
|
|
ZexaronS has quit IRC (Leaving) |
13:36
🔗
|
|
atomotic has quit IRC (Quit: My MacBook has gone to sleep. ZZZzzz…) |
13:37
🔗
|
|
icedice has quit IRC (Ping timeout: 250 seconds) |
13:52
🔗
|
|
ZexaronS has joined #archiveteam |
14:36
🔗
|
|
atomotic has joined #archiveteam |
16:26
🔗
|
|
atomotic has quit IRC (Quit: Textual IRC Client: www.textualapp.com) |
16:31
🔗
|
|
Atom has quit IRC (Read error: Operation timed out) |
16:35
🔗
|
|
MMovie2 has quit IRC (Read error: Operation timed out) |
16:45
🔗
|
|
MMovie has joined #archiveteam |
17:36
🔗
|
|
ats has quit IRC (Quit: kernel!) |
17:39
🔗
|
|
MMovie has quit IRC (Read error: Operation timed out) |
17:40
🔗
|
|
ats has joined #archiveteam |
17:50
🔗
|
|
MMovie has joined #archiveteam |
18:26
🔗
|
|
powerKitt has joined #archiveteam |
18:27
🔗
|
powerKitt |
https://blogs.msdn.microsoft.com/bharry/2017/03/31/shutting-down-codeplex/ Have we grabbed a copy of CodePlex yet? |
18:27
🔗
|
JAA |
Not yet, but it's known: http://archiveteam.org/index.php?title=CodePlex |
18:28
🔗
|
JAA |
Also, it doesn't look like they'll nuke the content anytime soon. For once, a company handles a service shutdown properly. |
18:29
🔗
|
powerKitt |
Yeah, and the download tools for the archived projects sound pretty complete. |
18:30
🔗
|
powerKitt |
We should probably still grab a copy of the archive, like we did with the Google Code archive. |
18:30
🔗
|
powerKitt |
Read only archives don't last forever. |
18:31
🔗
|
JAA |
Oh, definitely. |
18:37
🔗
|
|
j08nY has joined #archiveteam |
19:11
🔗
|
kittymeow |
Is there any cache of http://web.archive.org/web/19980509084931/http://members.visi.net/ member websites? http://members.visi.net/~fathom was from before 1999 I think so internet archive doesn't have it ( http://www.infotoday.com/online/mar02/OnTheNet.htm ) |
19:12
🔗
|
|
TheLovina has quit IRC (Read error: Operation timed out) |
19:20
🔗
|
schbirid |
kittymeow: http://web.archive.org/web/*/http://members.visi.net/* |
19:38
🔗
|
|
fie has quit IRC (Ping timeout: 370 seconds) |
19:39
🔗
|
|
TheLovina has joined #archiveteam |
19:46
🔗
|
|
powerKitt has quit IRC (Quit: Page closed) |
19:57
🔗
|
icedice2 |
https://imgbox.com/ |
19:57
🔗
|
icedice2 |
"Dear User, |
19:57
🔗
|
icedice2 |
We’d like to inform you that we will be shutting down services on June 30th, 2017. Please download and backup your files before this date. |
19:57
🔗
|
icedice2 |
Sincerely, |
19:57
🔗
|
icedice2 |
The Team " |
20:02
🔗
|
|
RichardG has quit IRC (Ping timeout: 370 seconds) |
20:12
🔗
|
|
RichardG has joined #archiveteam |
20:20
🔗
|
|
fie has joined #archiveteam |
20:47
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
21:08
🔗
|
tapedrive |
Hey ArchiveTeam, as you may know, it's the UK general election today, and therefore I've compiled a list of all the candidate's facebook, twitter and campaign websites. |
21:08
🔗
|
tapedrive |
There's a lot of URLs - 6098 total, (2324 twitter, 2303 facebook and 1471 websites). |
21:08
🔗
|
tapedrive |
It would be great if they were processed as soon as possible because I imagine tomorrow they'll all start changing and being taken down. |
21:08
🔗
|
tapedrive |
I've got the URL lists in three separate files - what's the next step? ArchiveBot? |
21:09
🔗
|
|
TheLovina has quit IRC (Read error: Operation timed out) |
21:09
🔗
|
|
ndiddy has joined #archiveteam |
21:13
🔗
|
xmc |
facebook tends to not work well in archivebot, we are generally banned for scraping |
21:13
🔗
|
xmc |
but websites, yes |
21:13
🔗
|
xmc |
that sounds like it took a ton of work! |
21:13
🔗
|
tapedrive |
Here are the files: websites: https://pastebin.com/cayH7CKT :: twitters: https://pastebin.com/1Puq0M4W :: facebooks: https://pastebin.com/RuTw1Rbg |
21:14
🔗
|
tapedrive |
Not really, there were a few websites I scraped to find all the links, but it only took an hour or so. |
21:14
🔗
|
tapedrive |
I've never used archivebot, so can I just leave those files with you? |
21:14
🔗
|
xmc |
i've got a lot of work that needs doing, but hopefully someone else here can take up the charge? |
21:19
🔗
|
tapedrive |
Just looking at the archivebot readthedocs, is there a way to do recursive archiving from a list of urls? |
21:19
🔗
|
xmc |
yes but it's liable to go off the rails unless you keep it on a tight leash |
21:19
🔗
|
Kaz |
tapedrive: might want to check your lists, i see "http://www.Facebook/Lab4fav" in there |
21:19
🔗
|
xmc |
you probably want !a < http://url --no-offsite-links --ignore-sets=blogs , at the very least |
21:20
🔗
|
tapedrive |
Kaz: I've scraped these from many other sites, so there may be some incorrect ones. |
21:21
🔗
|
tapedrive |
xmc: Is that url for the file with the url list, or each individual url? |
21:21
🔗
|
xmc |
the file with the url list |
21:22
🔗
|
xmc |
make sure it's a plain text file, one url per line |
21:47
🔗
|
|
ZexaronS has quit IRC (Leaving) |
21:47
🔗
|
MrRadar |
With that many URLs it'd probably be a good idea to split it into multiple jobs |
21:47
🔗
|
MrRadar |
Archivebot instances aren't always very stable |
21:50
🔗
|
tapedrive |
Would you recommend aborting now, and splitting it, or keeping it going and monitoring it? |
21:51
🔗
|
MrRadar |
Eh, keep monitoring it |
22:38
🔗
|
|
Jonison has quit IRC (Read error: Connection reset by peer) |
22:42
🔗
|
|
nertzy has joined #archiveteam |
22:53
🔗
|
|
icedice2 has quit IRC (Quit: Leaving) |
23:05
🔗
|
tapedrive |
Would it be acceptable to add more workers to the UK election candidates websites, seeing as they're all hosted on different domains, and time is critical with it? |
23:06
🔗
|
xmc |
more workers how? if you want to split it into multiple jobs, that would be sensible |
23:07
🔗
|
tapedrive |
With the concurrency command |
23:07
🔗
|
xmc |
oh! yeah go for it |
23:42
🔗
|
whydomain |
Have we archived the main UK party websites? If not, they'd probably be pretty important. |
23:42
🔗
|
whydomain |
However, they have a large amount of pages. |
23:50
🔗
|
|
Atom has joined #archiveteam |